[4/4,Aarch64] v2: Implement Aarch64 SIMD ABI

Message ID 1541699749.12016.9.camel@cavium.com
State New
Headers show
Series
  • v2: Implement Aarch64 SIMD ABI
Related show

Commit Message

Steve Ellcey Nov. 8, 2018, 5:55 p.m.
This is a patch 4 to support the Aarch64 SIMD ABI [1] in GCC.

It defines a new target hook targetm.check_part_clobbered that
takes a rtx_insn and checks to see if it is a call to a function
that may clobber partial registers.  It returns true by default,
which results in the current behaviour, but if we can determine
that the function will not do any partial clobbers (like the
Aarch64 SIMD functions) then it returns false.

Steve Ellcey
sellcey@cavium.com



2018-11-08  Steve Ellcey  <sellcey@cavium.com>

	* config/aarch64/aarch64.c (aarch64_check_part_clobbered): New function.
	(TARGET_CHECK_PART_CLOBBERED): New macro.
	* doc/tm.texi.in (TARGET_CHECK_PART_CLOBBERED): New hook.
	* lra-constraints.c (need_for_call_save_p): Use check_part_clobbered.
	* lra-int.h (check_part_clobbered): New field in lra_reg struct.
	* lra-lives.c (check_pseudos_live_through_calls): Pass in
	check_partial_clobber bool argument and use it.
	(process_bb_lives): Check basic block for functions that may do
	partial clobbers.  Pass this to check_pseudos_live_through_calls.
	* lra.c (initialize_lra_reg_info_element): Inialize 
	check_part_clobbered to false.
	* target.def (check_part_clobbered): New target hook.

Comments

Richard Sandiford Dec. 6, 2018, 12:25 p.m. | #1
Steve Ellcey <sellcey@cavium.com> writes:
> This is a patch 4 to support the Aarch64 SIMD ABI [1] in GCC.
>
> It defines a new target hook targetm.check_part_clobbered that
> takes a rtx_insn and checks to see if it is a call to a function
> that may clobber partial registers.  It returns true by default,
> which results in the current behaviour, but if we can determine
> that the function will not do any partial clobbers (like the
> Aarch64 SIMD functions) then it returns false.

Sorry, have a feeling this is going to be at least partly going
back on what I said before, but...

The patch only really deals with one user of the part-clobbered info,
namely LRA.  And as it happens, that caller does have access to the
relevant call insns (which was a concern before), since you walk them in:

  /* Check to see if any call might do a partial clobber.  */
  partial_clobber_in_bb = false;
  FOR_BB_INSNS_REVERSE_SAFE (bb, curr_insn, next)
    {
      if (CALL_P (curr_insn)
          && targetm.check_part_clobbered (curr_insn))
        {
          partial_clobber_in_bb = true;
          break;
        }
    }

Since we're looking at the call insns anyway, we could have a hook that
"jousts" two calls and picks the one that preserves *fewer* registers.
This would mean that loop produces a single instruction that conservatively
describes the call-preserved registers.  We could then stash that
instruction in lra_reg instead of the current check_part_clobbered
boolean.

The hook should by default be a null pointer, so that we can avoid
the instruction walk on targets that don't need it.

That would mean that LRA would always have a call instruction to hand
when asking about call-preserved information.  So I think we should
add an insn parameter to targetm.hard_regno_call_part_clobbered,
with a null insn selecting the defaul behaviour.   I know it's
going to be a pain to update all callers and targets, sorry.

This would also cope with the fact that, when SVE is enabled, SIMD
functions *do* still part-clobber the registers, just in a wider mode.
The current patch doesn't handle that, and it would be hard to fix without
pessimistically treating the functions as clobbering above 64 bits
rather 128 bits.

(Really, it would be good to overhaul the whole handling of ABIs
so that we have all the information about an ABI in one structure
and can ask "what ABI does this call use"?  But that's a lot of work.
The above should be good enough as long as the call-preserved behaviour
of ABIs follows a total ordering, which it does for AArch64.)

Thanks,
Richard
Steve Ellcey Jan. 4, 2019, 11:57 p.m. | #2
On Thu, 2018-12-06 at 12:25 +0000, Richard Sandiford wrote:
> 
> Since we're looking at the call insns anyway, we could have a hook that
> "jousts" two calls and picks the one that preserves *fewer* registers.
> This would mean that loop produces a single instruction that conservatively
> describes the call-preserved registers.  We could then stash that
> instruction in lra_reg instead of the current check_part_clobbered
> boolean.
> 
> The hook should by default be a null pointer, so that we can avoid
> the instruction walk on targets that don't need it.
> 
> That would mean that LRA would always have a call instruction to hand
> when asking about call-preserved information.  So I think we should
> add an insn parameter to targetm.hard_regno_call_part_clobbered,
> with a null insn selecting the defaul behaviour.   I know it's
> going to be a pain to update all callers and targets, sorry.

Richard,  here is an updated version of this patch.  It is not
completly tested yet but I wanted to send this out and make
sure it is what you had in mind and see if you had any comments about
the new target function while I am testing it (including building
some of the other targets).

Steve Ellcey
sellcey@cavium.com


2019-01-04  Steve Ellcey  <sellcey@marvell.com>

	* config/aarch64/aarch64.c (aarch64_simd_call_p): New function.
	(aarch64_hard_regno_call_part_clobbered): Add insn argument.
	(aarch64_return_call_with_max_clobbers): New function.
	(TARGET_RETURN_CALL_WITH_MAX_CLOBBERS): New macro.
	* config/avr/avr.c (avr_hard_regno_call_part_clobbered): Add insn
	argument.
	* config/i386/i386.c (ix86_hard_regno_call_part_clobbered): Ditto.
	* config/mips/mips.c (mips_hard_regno_call_part_clobbered): Ditto.
	* config/rs6000/rs6000.c (rs6000_hard_regno_call_part_clobbered): Ditto.
	* config/s390/s390.c (s390_hard_regno_call_part_clobbered): Ditto.
	* cselib.c (cselib_process_insn): Add argument to
	targetm.hard_regno_call_part_clobbered call.
	* conflicts.c (ira_build_conflicts): Ditto.
	* ira-costs.c (ira_tune_allocno_costs): Ditto.
	* lra-constraints.c (inherit_reload_reg): Ditto, plus refactor
	return statement.
	* lra-int.h (struct lra_reg): Add call_insn field.
	* lra-lives.c (check_pseudos_live_through_calls): Add call_insn
	argument.  Add argument to targetm.hard_regno_call_part_clobbered
	call.
	(process_bb_lives): Use new target function
	targetm.return_call_with_max_clobbers to set call_insn.
	Pass call_insn to check_pseudos_live_through_calls.
	Set call_insn in lra_reg_info.
	* lra.c (initialize_lra_reg_info_element): Set call_insn to NULL.
	* regcprop.c (copyprop_hardreg_forward_1): Add argument to
        targetm.hard_regno_call_part_clobbered call.
	* reginfo.c (choose_hard_reg_mode): Ditto.
	* regrename.c (check_new_reg_p): Ditto.
	* reload.c (find_equiv_reg): Ditto.
	* reload1.c (emit_reload_insns): Ditto.
	* sched-deps.c (deps_analyze_insn): Ditto.
	* sel-sched.c (init_regs_for_mode): Ditto.
	(mark_unavailable_hard_regs): Ditto.
	* targhooks.c (default_dwarf_frame_reg_mode): Ditto.
	* target.def (hard_regno_call_part_clobbered): Add insn argument.
	(return_call_with_max_clobbers): New target function.
	* doc/tm.texi: Regenerate.
	* doc/tm.texi.in (TARGET_RETURN_CALL_WITH_MAX_CLOBBERS): New hook.
	* hooks.c (hook_bool_uint_mode_false): Change to
	hook_bool_insn_uint_mode_false.
	* hooks.h (hook_bool_uint_mode_false): Ditto.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index c5036c8..87af31b 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1565,16 +1565,55 @@ aarch64_reg_save_mode (tree fndecl, unsigned regno)
 	   : (aarch64_simd_decl_p (fndecl) ? E_TFmode : E_DFmode);
 }
 
+/* Return true if the instruction is a call to a SIMD function, false
+   if it is not a SIMD function or if we do not know anything about
+   the function.  */
+
+static bool
+aarch64_simd_call_p (rtx_insn *insn)
+{
+  rtx symbol;
+  rtx call;
+  tree fndecl;
+
+  gcc_assert (CALL_P (insn));
+  call = get_call_rtx_from (insn);
+  symbol = XEXP (XEXP (call, 0), 0);
+  if (GET_CODE (symbol) != SYMBOL_REF)
+    return false;
+  fndecl = SYMBOL_REF_DECL (symbol);
+  if (!fndecl)
+    return false;
+
+  return aarch64_simd_decl_p (fndecl);
+}
+
 /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  The callee only saves
    the lower 64 bits of a 128-bit register.  Tell the compiler the callee
    clobbers the top 64 bits when restoring the bottom 64 bits.  */
 
 static bool
-aarch64_hard_regno_call_part_clobbered (unsigned int regno, machine_mode mode)
+aarch64_hard_regno_call_part_clobbered (rtx_insn *insn,
+					unsigned int regno,
+					machine_mode mode)
 {
+  if (insn && CALL_P (insn) && aarch64_simd_call_p (insn))
+    return false;
   return FP_REGNUM_P (regno) && maybe_gt (GET_MODE_SIZE (mode), 8);
 }
 
+/* Implement TARGET_RETURN_CALL_WITH_MAX_CLOBBERS.  */
+
+rtx_insn *
+aarch64_return_call_with_max_clobbers (rtx_insn *call_1, rtx_insn *call_2)
+{
+  gcc_assert (CALL_P (call_1));
+  if ((call_2 == NULL_RTX) || aarch64_simd_call_p (call_2))
+    return call_1;
+  else
+    return call_2;
+}
+
 /* Implement REGMODE_NATURAL_SIZE.  */
 poly_uint64
 aarch64_regmode_natural_size (machine_mode mode)
@@ -18524,6 +18563,10 @@ aarch64_libgcc_floating_mode_supported_p
 #define TARGET_HARD_REGNO_CALL_PART_CLOBBERED \
   aarch64_hard_regno_call_part_clobbered
 
+#undef TARGET_RETURN_CALL_WITH_MAX_CLOBBERS
+#define TARGET_RETURN_CALL_WITH_MAX_CLOBBERS \
+  aarch64_return_call_with_max_clobbers
+
 #undef TARGET_CONSTANT_ALIGNMENT
 #define TARGET_CONSTANT_ALIGNMENT aarch64_constant_alignment
 
diff --git a/gcc/config/avr/avr.c b/gcc/config/avr/avr.c
index 023308b..2cf993d 100644
--- a/gcc/config/avr/avr.c
+++ b/gcc/config/avr/avr.c
@@ -12181,7 +12181,9 @@ avr_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
 /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  */
 
 static bool
-avr_hard_regno_call_part_clobbered (unsigned regno, machine_mode mode)
+avr_hard_regno_call_part_clobbered (rtx_insn *insn ATTRIBUTE_UNUSED,
+				    unsigned regno,
+				    machine_mode mode)
 {
   /* FIXME: This hook gets called with MODE:REGNO combinations that don't
         represent valid hard registers like, e.g. HI:29.  Returning TRUE
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 016d6e3..78dc720 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -40161,7 +40161,9 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
    the low 16 bytes are saved.  */
 
 static bool
-ix86_hard_regno_call_part_clobbered (unsigned int regno, machine_mode mode)
+ix86_hard_regno_call_part_clobbered (rtx_insn *insn ATTRIBUTE_UNUSED,
+				     unsigned int regno,
+				     machine_mode mode)
 {
   return SSE_REGNO_P (regno) && GET_MODE_SIZE (mode) > 16;
 }
diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 95dc946..05a2ade 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -12906,7 +12906,9 @@ mips_hard_regno_scratch_ok (unsigned int regno)
    registers with MODE > 64 bits are part clobbered too.  */
 
 static bool
-mips_hard_regno_call_part_clobbered (unsigned int regno, machine_mode mode)
+mips_hard_regno_call_part_clobbered (rtx_insn *insn ATTRIBUTE_UNUSED,
+				     unsigned int regno,
+				     machine_mode mode)
 {
   if (TARGET_FLOATXX
       && hard_regno_nregs (regno, mode) == 1
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index a257554..6d10d24 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -2197,7 +2197,9 @@ rs6000_modes_tieable_p (machine_mode mode1, machine_mode mode2)
 /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  */
 
 static bool
-rs6000_hard_regno_call_part_clobbered (unsigned int regno, machine_mode mode)
+rs6000_hard_regno_call_part_clobbered (rtx_insn *insn ATTRIBUTE_UNUSED,
+				       unsigned int regno,
+				       machine_mode mode)
 {
   if (TARGET_32BIT
       && TARGET_POWERPC64
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index ea2be10..5f941d9 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -10098,7 +10098,9 @@ s390_hard_regno_scratch_ok (unsigned int regno)
    bytes are saved across calls, however.  */
 
 static bool
-s390_hard_regno_call_part_clobbered (unsigned int regno, machine_mode mode)
+s390_hard_regno_call_part_clobbered (rtx_insn *insn ATTRIBUTE_UNUSED,
+				     unsigned int regno,
+				     machine_mode mode)
 {
   if (!TARGET_64BIT
       && TARGET_ZARCH
diff --git a/gcc/cselib.c b/gcc/cselib.c
index cef4bc0..84c17c2 100644
--- a/gcc/cselib.c
+++ b/gcc/cselib.c
@@ -2770,7 +2770,7 @@ cselib_process_insn (rtx_insn *insn)
 	if (call_used_regs[i]
 	    || (REG_VALUES (i) && REG_VALUES (i)->elt
 		&& (targetm.hard_regno_call_part_clobbered
-		    (i, GET_MODE (REG_VALUES (i)->elt->val_rtx)))))
+		    (insn, i, GET_MODE (REG_VALUES (i)->elt->val_rtx)))))
 	  cselib_invalidate_regno (i, reg_raw_mode[i]);
 
       /* Since it is not clear how cselib is going to be used, be
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index edc0902..07171e5 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -1894,7 +1894,7 @@ of @code{CALL_USED_REGISTERS}.
 @cindex call-used register
 @cindex call-clobbered register
 @cindex call-saved register
-@deftypefn {Target Hook} bool TARGET_HARD_REGNO_CALL_PART_CLOBBERED (unsigned int @var{regno}, machine_mode @var{mode})
+@deftypefn {Target Hook} bool TARGET_HARD_REGNO_CALL_PART_CLOBBERED (rtx_insn *@var{}, unsigned int @var{regno}, machine_mode @var{mode})
 This hook should return true if @var{regno} is partly call-saved and
 partly call-clobbered, and if a value of mode @var{mode} would be partly
 clobbered by a call.  For example, if the low 32 bits of @var{regno} are
@@ -1905,6 +1905,17 @@ The default implementation returns false, which is correct
 for targets that don't have partly call-clobbered registers.
 @end deftypefn
 
+@deftypefn {Target Hook} {rtx_insn *} TARGET_RETURN_CALL_WITH_MAX_CLOBBERS (rtx_insn *@var{call_1}, rtx_insn *@var{call_2})
+This hook returns a pointer to the call that partially clobbers the
+most registers.  If a platform supports multiple ABIs where the registers
+that are partially clobbered may vary, this function compares two
+two calls and return a pointer to the one that clobbers the most registers.
+
+The registers clobbered in different ABIs must be a proper subset or
+superset of all other ABIs.  @var{call_1} must always be a call insn,
+call_2 may be NULL or a call insn.
+@end deftypefn
+
 @findex fixed_regs
 @findex call_used_regs
 @findex global_regs
@@ -2919,7 +2930,7 @@ the local anchor could be shared by other accesses to nearby locations.
 
 The hook returns true if it succeeds, storing the offset of the
 anchor from the base in @var{offset1} and the offset of the final address
-from the anchor in @var{offset2}.  The default implementation returns false.
+from the anchor in @var{offset2}.  ehe defnult implementation returns false.
 @end deftypefn
 
 @deftypefn {Target Hook} reg_class_t TARGET_SPILL_CLASS (reg_class_t, @var{machine_mode})
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 976a700..97a2ade 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -1707,6 +1707,8 @@ of @code{CALL_USED_REGISTERS}.
 @cindex call-saved register
 @hook TARGET_HARD_REGNO_CALL_PART_CLOBBERED
 
+@hook TARGET_RETURN_CALL_WITH_MAX_CLOBBERS
+
 @findex fixed_regs
 @findex call_used_regs
 @findex global_regs
diff --git a/gcc/hooks.c b/gcc/hooks.c
index bbc35fc..f95659b 100644
--- a/gcc/hooks.c
+++ b/gcc/hooks.c
@@ -142,7 +142,7 @@ hook_bool_puint64_puint64_true (poly_uint64, poly_uint64)
 
 /* Generic hook that takes (unsigned int, machine_mode) and returns false.  */
 bool
-hook_bool_uint_mode_false (unsigned int, machine_mode)
+hook_bool_insn_uint_mode_false (rtx_insn *, unsigned int, machine_mode)
 {
   return false;
 }
diff --git a/gcc/hooks.h b/gcc/hooks.h
index 9e4bc29..dc6b2e1 100644
--- a/gcc/hooks.h
+++ b/gcc/hooks.h
@@ -40,7 +40,9 @@ extern bool hook_bool_const_rtx_insn_const_rtx_insn_true (const rtx_insn *,
 extern bool hook_bool_mode_uhwi_false (machine_mode,
 				       unsigned HOST_WIDE_INT);
 extern bool hook_bool_puint64_puint64_true (poly_uint64, poly_uint64);
-extern bool hook_bool_uint_mode_false (unsigned int, machine_mode);
+extern bool hook_bool_insn_uint_mode_false (rtx_insn *,
+					    unsigned int,
+					    machine_mode);
 extern bool hook_bool_uint_mode_true (unsigned int, machine_mode);
 extern bool hook_bool_tree_false (tree);
 extern bool hook_bool_const_tree_false (const_tree);
diff --git a/gcc/ira-conflicts.c b/gcc/ira-conflicts.c
index b57468b..b697e57 100644
--- a/gcc/ira-conflicts.c
+++ b/gcc/ira-conflicts.c
@@ -808,7 +808,8 @@ ira_build_conflicts (void)
 		 regs must conflict with them.  */
 	      for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
 		if (!TEST_HARD_REG_BIT (call_used_reg_set, regno)
-		    && targetm.hard_regno_call_part_clobbered (regno,
+		    && targetm.hard_regno_call_part_clobbered (NULL,
+							       regno,
 							       obj_mode))
 		  {
 		    SET_HARD_REG_BIT (OBJECT_CONFLICT_HARD_REGS (obj), regno);
diff --git a/gcc/ira-costs.c b/gcc/ira-costs.c
index e5d8804..7f60712 100644
--- a/gcc/ira-costs.c
+++ b/gcc/ira-costs.c
@@ -2379,7 +2379,8 @@ ira_tune_allocno_costs (void)
 						   *crossed_calls_clobber_regs)
 		  && (ira_hard_reg_set_intersection_p (regno, mode,
 						       call_used_reg_set)
-		      || targetm.hard_regno_call_part_clobbered (regno,
+		      || targetm.hard_regno_call_part_clobbered (NULL,
+								 regno,
 								 mode)))
 		cost += (ALLOCNO_CALL_FREQ (a)
 			 * (ira_memory_move_cost[mode][rclass][0]
diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index 7ffcd35..31a567a 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -5368,16 +5368,24 @@ inherit_reload_reg (bool def_p, int original_regno,
 static inline bool
 need_for_call_save_p (int regno)
 {
+  machine_mode pmode = PSEUDO_REGNO_MODE (regno);
+  int new_regno = reg_renumber[regno];
+
   lra_assert (regno >= FIRST_PSEUDO_REGISTER && reg_renumber[regno] >= 0);
-  return (usage_insns[regno].calls_num < calls_num
-	  && (overlaps_hard_reg_set_p
-	      ((flag_ipa_ra &&
-		! hard_reg_set_empty_p (lra_reg_info[regno].actual_call_used_reg_set))
-	       ? lra_reg_info[regno].actual_call_used_reg_set
-	       : call_used_reg_set,
-	       PSEUDO_REGNO_MODE (regno), reg_renumber[regno])
-	      || (targetm.hard_regno_call_part_clobbered
-		  (reg_renumber[regno], PSEUDO_REGNO_MODE (regno)))));
+
+  if (usage_insns[regno].calls_num >= calls_num)
+    return false;
+
+  if (flag_ipa_ra
+      && !hard_reg_set_empty_p (lra_reg_info[regno].actual_call_used_reg_set))
+    return (overlaps_hard_reg_set_p
+		(lra_reg_info[regno].actual_call_used_reg_set, pmode, new_regno)
+	    || targetm.hard_regno_call_part_clobbered
+		(lra_reg_info[regno].call_insn, new_regno, pmode));
+  else
+    return (overlaps_hard_reg_set_p (call_used_reg_set, pmode, new_regno)
+            || targetm.hard_regno_call_part_clobbered
+		(lra_reg_info[regno].call_insn, new_regno, pmode));
 }
 
 /* Global registers occurring in the current EBB.  */
diff --git a/gcc/lra-int.h b/gcc/lra-int.h
index 9d9e81d..ccc7b00 100644
--- a/gcc/lra-int.h
+++ b/gcc/lra-int.h
@@ -117,6 +117,8 @@ struct lra_reg
   /* This member is set up in lra-lives.c for subsequent
      assignments.  */
   lra_copy_t copies;
+  /* Call instruction that may affect this register.  */
+  rtx_insn *call_insn;
 };
 
 /* References to the common info about each register.  */
diff --git a/gcc/lra-lives.c b/gcc/lra-lives.c
index 7b60691..fafb9e3 100644
--- a/gcc/lra-lives.c
+++ b/gcc/lra-lives.c
@@ -579,7 +579,8 @@ lra_setup_reload_pseudo_preferenced_hard_reg (int regno,
    PSEUDOS_LIVE_THROUGH_CALLS and PSEUDOS_LIVE_THROUGH_SETJUMPS.  */
 static inline void
 check_pseudos_live_through_calls (int regno,
-				  HARD_REG_SET last_call_used_reg_set)
+				  HARD_REG_SET last_call_used_reg_set,
+				  rtx_insn *call_insn)
 {
   int hr;
 
@@ -590,7 +591,8 @@ check_pseudos_live_through_calls (int regno,
 		    last_call_used_reg_set);
 
   for (hr = 0; HARD_REGISTER_NUM_P (hr); hr++)
-    if (targetm.hard_regno_call_part_clobbered (hr,
+    if (targetm.hard_regno_call_part_clobbered (call_insn,
+						hr,
 						PSEUDO_REGNO_MODE (regno)))
       add_to_hard_reg_set (&lra_reg_info[regno].conflict_hard_regs,
 			   PSEUDO_REGNO_MODE (regno), hr);
@@ -635,6 +637,7 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
   rtx link, *link_loc;
   bool need_curr_point_incr;
   HARD_REG_SET last_call_used_reg_set;
+  rtx_insn *call_insn;
 
   reg_live_out = df_get_live_out (bb);
   sparseset_clear (pseudos_live);
@@ -658,6 +661,17 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
   if (lra_dump_file != NULL)
     fprintf (lra_dump_file, "  BB %d\n", bb->index);
 
+  call_insn = NULL;
+  if (targetm.return_call_with_max_clobbers)
+    {
+      FOR_BB_INSNS_REVERSE_SAFE (bb, curr_insn, next)
+	{
+	  if (CALL_P (curr_insn))
+	    call_insn = targetm.return_call_with_max_clobbers (curr_insn,
+							       call_insn);
+	}
+    }
+
   /* Scan the code of this basic block, noting which pseudos and hard
      regs are born or die.
 
@@ -847,7 +861,8 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
 	      update_pseudo_point (reg->regno, curr_point, USE_POINT);
 	      mark_regno_live (reg->regno, reg->biggest_mode);
 	      check_pseudos_live_through_calls (reg->regno,
-						last_call_used_reg_set);
+						last_call_used_reg_set,
+						call_insn);
 	    }
 
 	  if (!HARD_REGISTER_NUM_P (reg->regno))
@@ -912,9 +927,13 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
 		{
 		  IOR_HARD_REG_SET (lra_reg_info[j].actual_call_used_reg_set,
 				    this_call_used_reg_set);
+
+		  lra_reg_info[j].call_insn = curr_insn;
+
 		  if (flush)
-		    check_pseudos_live_through_calls
-		      (j, last_call_used_reg_set);
+		    check_pseudos_live_through_calls (j,
+						      last_call_used_reg_set,
+						      call_insn);
 		}
 	      COPY_HARD_REG_SET(last_call_used_reg_set, this_call_used_reg_set);
 	    }
@@ -956,7 +975,8 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
 	      update_pseudo_point (reg->regno, curr_point, USE_POINT);
 	    mark_regno_live (reg->regno, reg->biggest_mode);
 	    check_pseudos_live_through_calls (reg->regno,
-					      last_call_used_reg_set);
+					      last_call_used_reg_set,
+					      call_insn);
 	  }
 
       for (reg = curr_static_id->hard_regs; reg != NULL; reg = reg->next)
@@ -1125,7 +1145,7 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
       if (sparseset_cardinality (pseudos_live_through_calls) == 0)
 	break;
       if (sparseset_bit_p (pseudos_live_through_calls, j))
-	check_pseudos_live_through_calls (j, last_call_used_reg_set);
+	check_pseudos_live_through_calls (j, last_call_used_reg_set, call_insn);
     }
 
   for (i = 0; HARD_REGISTER_NUM_P (i); ++i)
diff --git a/gcc/lra.c b/gcc/lra.c
index 75ee742..b0e999f 100644
--- a/gcc/lra.c
+++ b/gcc/lra.c
@@ -1344,6 +1344,7 @@ initialize_lra_reg_info_element (int i)
   lra_reg_info[i].val = get_new_reg_value ();
   lra_reg_info[i].offset = 0;
   lra_reg_info[i].copies = NULL;
+  lra_reg_info[i].call_insn = NULL;
 }
 
 /* Initialize common reg info and copies.  */
diff --git a/gcc/regcprop.c b/gcc/regcprop.c
index b107ea2..e6bdeb0 100644
--- a/gcc/regcprop.c
+++ b/gcc/regcprop.c
@@ -1054,7 +1054,7 @@ copyprop_hardreg_forward_1 (basic_block bb, struct value_data *vd)
 	  for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
 	    if ((TEST_HARD_REG_BIT (regs_invalidated_by_this_call, regno)
 		 || (targetm.hard_regno_call_part_clobbered
-		     (regno, vd->e[regno].mode)))
+		     (insn, regno, vd->e[regno].mode)))
 		&& (regno < set_regno || regno >= set_regno + set_nregs))
 	      kill_value_regno (regno, 1, vd);
 
diff --git a/gcc/reginfo.c b/gcc/reginfo.c
index 7a7fa4d..315c5ec 100644
--- a/gcc/reginfo.c
+++ b/gcc/reginfo.c
@@ -639,7 +639,7 @@ choose_hard_reg_mode (unsigned int regno ATTRIBUTE_UNUSED,
     if (hard_regno_nregs (regno, mode) == nregs
 	&& targetm.hard_regno_mode_ok (regno, mode)
 	&& (!call_saved
-	    || !targetm.hard_regno_call_part_clobbered (regno, mode))
+	    || !targetm.hard_regno_call_part_clobbered (NULL, regno, mode))
 	&& maybe_gt (GET_MODE_SIZE (mode), GET_MODE_SIZE (found_mode)))
       found_mode = mode;
 
@@ -647,7 +647,7 @@ choose_hard_reg_mode (unsigned int regno ATTRIBUTE_UNUSED,
     if (hard_regno_nregs (regno, mode) == nregs
 	&& targetm.hard_regno_mode_ok (regno, mode)
 	&& (!call_saved
-	    || !targetm.hard_regno_call_part_clobbered (regno, mode))
+	    || !targetm.hard_regno_call_part_clobbered (NULL, regno, mode))
 	&& maybe_gt (GET_MODE_SIZE (mode), GET_MODE_SIZE (found_mode)))
       found_mode = mode;
 
@@ -655,7 +655,7 @@ choose_hard_reg_mode (unsigned int regno ATTRIBUTE_UNUSED,
     if (hard_regno_nregs (regno, mode) == nregs
 	&& targetm.hard_regno_mode_ok (regno, mode)
 	&& (!call_saved
-	    || !targetm.hard_regno_call_part_clobbered (regno, mode))
+	    || !targetm.hard_regno_call_part_clobbered (NULL, regno, mode))
 	&& maybe_gt (GET_MODE_SIZE (mode), GET_MODE_SIZE (found_mode)))
       found_mode = mode;
 
@@ -663,7 +663,7 @@ choose_hard_reg_mode (unsigned int regno ATTRIBUTE_UNUSED,
     if (hard_regno_nregs (regno, mode) == nregs
 	&& targetm.hard_regno_mode_ok (regno, mode)
 	&& (!call_saved
-	    || !targetm.hard_regno_call_part_clobbered (regno, mode))
+	    || !targetm.hard_regno_call_part_clobbered (NULL, regno, mode))
 	&& maybe_gt (GET_MODE_SIZE (mode), GET_MODE_SIZE (found_mode)))
       found_mode = mode;
 
@@ -677,7 +677,7 @@ choose_hard_reg_mode (unsigned int regno ATTRIBUTE_UNUSED,
       if (hard_regno_nregs (regno, mode) == nregs
 	  && targetm.hard_regno_mode_ok (regno, mode)
 	  && (!call_saved
-	      || !targetm.hard_regno_call_part_clobbered (regno, mode)))
+	      || !targetm.hard_regno_call_part_clobbered (NULL, regno, mode)))
 	return mode;
     }
 
diff --git a/gcc/regrename.c b/gcc/regrename.c
index a180ced..109add0 100644
--- a/gcc/regrename.c
+++ b/gcc/regrename.c
@@ -339,9 +339,9 @@ check_new_reg_p (int reg ATTRIBUTE_UNUSED, int new_reg,
 	 && ! DEBUG_INSN_P (tmp->insn))
 	|| (this_head->need_caller_save_reg
 	    && ! (targetm.hard_regno_call_part_clobbered
-		  (reg, GET_MODE (*tmp->loc)))
+		  (tmp->insn, reg, GET_MODE (*tmp->loc)))
 	    && (targetm.hard_regno_call_part_clobbered
-		(new_reg, GET_MODE (*tmp->loc)))))
+		(tmp->insn, new_reg, GET_MODE (*tmp->loc)))))
       return false;
 
   return true;
diff --git a/gcc/reload.c b/gcc/reload.c
index 6cfd5e2..0cc82d0 100644
--- a/gcc/reload.c
+++ b/gcc/reload.c
@@ -6912,13 +6912,16 @@ find_equiv_reg (rtx goal, rtx_insn *insn, enum reg_class rclass, int other,
 	  if (regno >= 0 && regno < FIRST_PSEUDO_REGISTER)
 	    for (i = 0; i < nregs; ++i)
 	      if (call_used_regs[regno + i]
-		  || targetm.hard_regno_call_part_clobbered (regno + i, mode))
+		  || targetm.hard_regno_call_part_clobbered (p,
+							     regno + i,
+							     mode))
 		return 0;
 
 	  if (valueno >= 0 && valueno < FIRST_PSEUDO_REGISTER)
 	    for (i = 0; i < valuenregs; ++i)
 	      if (call_used_regs[valueno + i]
-		  || targetm.hard_regno_call_part_clobbered (valueno + i,
+		  || targetm.hard_regno_call_part_clobbered (p,
+							     valueno + i,
 							     mode))
 		return 0;
 	}
diff --git a/gcc/reload1.c b/gcc/reload1.c
index b703402..5490ae5 100644
--- a/gcc/reload1.c
+++ b/gcc/reload1.c
@@ -8289,7 +8289,8 @@ emit_reload_insns (struct insn_chain *chain)
 			   : out_regno + k);
 		      reg_reloaded_insn[regno + k] = insn;
 		      SET_HARD_REG_BIT (reg_reloaded_valid, regno + k);
-		      if (targetm.hard_regno_call_part_clobbered (regno + k,
+		      if (targetm.hard_regno_call_part_clobbered (insn,
+								  regno + k,
 								  mode))
 			SET_HARD_REG_BIT (reg_reloaded_call_part_clobbered,
 					  regno + k);
@@ -8369,7 +8370,8 @@ emit_reload_insns (struct insn_chain *chain)
 			   : in_regno + k);
 		      reg_reloaded_insn[regno + k] = insn;
 		      SET_HARD_REG_BIT (reg_reloaded_valid, regno + k);
-		      if (targetm.hard_regno_call_part_clobbered (regno + k,
+		      if (targetm.hard_regno_call_part_clobbered (insn,
+								  regno + k,
 								  mode))
 			SET_HARD_REG_BIT (reg_reloaded_call_part_clobbered,
 					  regno + k);
@@ -8485,7 +8487,7 @@ emit_reload_insns (struct insn_chain *chain)
 		      CLEAR_HARD_REG_BIT (reg_reloaded_dead, src_regno + k);
 		      SET_HARD_REG_BIT (reg_reloaded_valid, src_regno + k);
 		      if (targetm.hard_regno_call_part_clobbered
-			  (src_regno + k, mode))
+			  (insn, src_regno + k, mode))
 			SET_HARD_REG_BIT (reg_reloaded_call_part_clobbered,
 					  src_regno + k);
 		      else
diff --git a/gcc/sched-deps.c b/gcc/sched-deps.c
index e15cf08..53c2e26 100644
--- a/gcc/sched-deps.c
+++ b/gcc/sched-deps.c
@@ -3728,7 +3728,8 @@ deps_analyze_insn (struct deps_desc *deps, rtx_insn *insn)
              Since we only have a choice between 'might be clobbered'
              and 'definitely not clobbered', we must include all
              partly call-clobbered registers here.  */
-	    else if (targetm.hard_regno_call_part_clobbered (i,
+	    else if (targetm.hard_regno_call_part_clobbered (insn,
+							     i,
 							     reg_raw_mode[i])
                      || TEST_HARD_REG_BIT (regs_invalidated_by_call, i))
               SET_REGNO_REG_SET (reg_pending_clobbers, i);
diff --git a/gcc/sel-sched.c b/gcc/sel-sched.c
index 2bae6ef..c6b4593 100644
--- a/gcc/sel-sched.c
+++ b/gcc/sel-sched.c
@@ -1102,7 +1102,7 @@ init_regs_for_mode (machine_mode mode)
       if (i >= 0)
         continue;
 
-      if (targetm.hard_regno_call_part_clobbered (cur_reg, mode))
+      if (targetm.hard_regno_call_part_clobbered (NULL, cur_reg, mode))
         SET_HARD_REG_BIT (sel_hrd.regs_for_call_clobbered[mode],
                           cur_reg);
 
@@ -1251,7 +1251,7 @@ mark_unavailable_hard_regs (def_t def, struct reg_rename *reg_rename_p,
 
   /* Exclude registers that are partially call clobbered.  */
   if (def->crosses_call
-      && !targetm.hard_regno_call_part_clobbered (regno, mode))
+      && !targetm.hard_regno_call_part_clobbered (NULL, regno, mode))
     AND_COMPL_HARD_REG_SET (reg_rename_p->available_for_renaming,
                             sel_hrd.regs_for_call_clobbered[mode]);
 
diff --git a/gcc/target.def b/gcc/target.def
index e8f0f70..ecb0ea7 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -5772,8 +5772,21 @@ return true for a 64-bit mode but false for a 32-bit mode.\n\
 \n\
 The default implementation returns false, which is correct\n\
 for targets that don't have partly call-clobbered registers.",
- bool, (unsigned int regno, machine_mode mode),
- hook_bool_uint_mode_false)
+ bool, (rtx_insn *, unsigned int regno, machine_mode mode),
+ hook_bool_insn_uint_mode_false)
+
+DEFHOOK
+(return_call_with_max_clobbers,
+ "This hook returns a pointer to the call that partially clobbers the\n\
+most registers.  If a platform supports multiple ABIs where the registers\n\
+that are partially clobbered may vary, this function compares two\n\
+two calls and return a pointer to the one that clobbers the most registers.\n\
+\n\
+The registers clobbered in different ABIs must be a proper subset or\n\
+superset of all other ABIs.  @var{call_1} must always be a call insn,\n\
+call_2 may be NULL or a call insn.",
+ rtx_insn *, (rtx_insn *call_1, rtx_insn *call_2),
+ NULL)
 
 /* Return the smallest number of different values for which it is best to
    use a jump-table instead of a tree of conditional branches.  */
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 898848f..2cbdc4a 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -1930,7 +1930,7 @@ default_dwarf_frame_reg_mode (int regno)
 {
   machine_mode save_mode = reg_raw_mode[regno];
 
-  if (targetm.hard_regno_call_part_clobbered (regno, save_mode))
+  if (targetm.hard_regno_call_part_clobbered (NULL, regno, save_mode))
     save_mode = choose_hard_reg_mode (regno, 1, true);
   return save_mode;
 }
Richard Sandiford Jan. 7, 2019, 5:38 p.m. | #3
Steve Ellcey <sellcey@marvell.com> writes:
> On Thu, 2018-12-06 at 12:25 +0000, Richard Sandiford wrote:
>> 
>> Since we're looking at the call insns anyway, we could have a hook that
>> "jousts" two calls and picks the one that preserves *fewer* registers.
>> This would mean that loop produces a single instruction that conservatively
>> describes the call-preserved registers.  We could then stash that
>> instruction in lra_reg instead of the current check_part_clobbered
>> boolean.
>> 
>> The hook should by default be a null pointer, so that we can avoid
>> the instruction walk on targets that don't need it.
>> 
>> That would mean that LRA would always have a call instruction to hand
>> when asking about call-preserved information.  So I think we should
>> add an insn parameter to targetm.hard_regno_call_part_clobbered,
>> with a null insn selecting the defaul behaviour.   I know it's
>> going to be a pain to update all callers and targets, sorry.
>
> Richard,  here is an updated version of this patch.  It is not
> completly tested yet but I wanted to send this out and make
> sure it is what you had in mind and see if you had any comments about
> the new target function while I am testing it (including building
> some of the other targets).

Yeah, this was the kind of thing I had in mind, thanks.

>  /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  The callee only saves
>     the lower 64 bits of a 128-bit register.  Tell the compiler the callee
>     clobbers the top 64 bits when restoring the bottom 64 bits.  */
>  
>  static bool
> -aarch64_hard_regno_call_part_clobbered (unsigned int regno, machine_mode mode)
> +aarch64_hard_regno_call_part_clobbered (rtx_insn *insn,
> +					unsigned int regno,
> +					machine_mode mode)
>  {
> +  if (insn && CALL_P (insn) && aarch64_simd_call_p (insn))
> +    return false;
>    return FP_REGNUM_P (regno) && maybe_gt (GET_MODE_SIZE (mode), 8);

This should be choosing between 8 and 16 for the maybe_gt, since
even SIMD functions clobber bits 128 and above for SVE.

> +/* Implement TARGET_RETURN_CALL_WITH_MAX_CLOBBERS.  */
> +
> +rtx_insn *
> +aarch64_return_call_with_max_clobbers (rtx_insn *call_1, rtx_insn *call_2)
> +{
> +  gcc_assert (CALL_P (call_1));
> +  if ((call_2 == NULL_RTX) || aarch64_simd_call_p (call_2))
> +    return call_1;
> +  else
> +    return call_2;

Nit: redundant parens in "(call_2 == NULL_RTX)".

> diff --git a/gcc/config/avr/avr.c b/gcc/config/avr/avr.c
> index 023308b..2cf993d 100644
> --- a/gcc/config/avr/avr.c
> +++ b/gcc/config/avr/avr.c
> @@ -12181,7 +12181,9 @@ avr_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
>  /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  */
>  
>  static bool
> -avr_hard_regno_call_part_clobbered (unsigned regno, machine_mode mode)
> +avr_hard_regno_call_part_clobbered (rtx_insn *insn ATTRIBUTE_UNUSED,
> +				    unsigned regno,
> +				    machine_mode mode)
>  {

Also very minor, sorry, but: I think it's usual to put the parameters
on the same line when they fit.  Same for the other hooks.

> @@ -2919,7 +2930,7 @@ the local anchor could be shared by other accesses to nearby locations.
>  
>  The hook returns true if it succeeds, storing the offset of the
>  anchor from the base in @var{offset1} and the offset of the final address
> -from the anchor in @var{offset2}.  The default implementation returns false.
> +from the anchor in @var{offset2}.  ehe defnult implementation returns false.
>  @end deftypefn

Stray change here.

> diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
> index 7ffcd35..31a567a 100644
> --- a/gcc/lra-constraints.c
> +++ b/gcc/lra-constraints.c
> @@ -5368,16 +5368,24 @@ inherit_reload_reg (bool def_p, int original_regno,
>  static inline bool
>  need_for_call_save_p (int regno)
>  {
> +  machine_mode pmode = PSEUDO_REGNO_MODE (regno);
> +  int new_regno = reg_renumber[regno];
> +
>    lra_assert (regno >= FIRST_PSEUDO_REGISTER && reg_renumber[regno] >= 0);
> -  return (usage_insns[regno].calls_num < calls_num
> -	  && (overlaps_hard_reg_set_p
> -	      ((flag_ipa_ra &&
> -		! hard_reg_set_empty_p (lra_reg_info[regno].actual_call_used_reg_set))
> -	       ? lra_reg_info[regno].actual_call_used_reg_set
> -	       : call_used_reg_set,
> -	       PSEUDO_REGNO_MODE (regno), reg_renumber[regno])
> -	      || (targetm.hard_regno_call_part_clobbered
> -		  (reg_renumber[regno], PSEUDO_REGNO_MODE (regno)))));
> +
> +  if (usage_insns[regno].calls_num >= calls_num)
> +    return false;
> +
> +  if (flag_ipa_ra
> +      && !hard_reg_set_empty_p (lra_reg_info[regno].actual_call_used_reg_set))
> +    return (overlaps_hard_reg_set_p
> +		(lra_reg_info[regno].actual_call_used_reg_set, pmode, new_regno)
> +	    || targetm.hard_regno_call_part_clobbered
> +		(lra_reg_info[regno].call_insn, new_regno, pmode));
> +  else
> +    return (overlaps_hard_reg_set_p (call_used_reg_set, pmode, new_regno)
> +            || targetm.hard_regno_call_part_clobbered
> +		(lra_reg_info[regno].call_insn, new_regno, pmode));
>  }
>  

I think it'd be safer just to add the new parameter to the existing code,
rather than rework it like this.

I'm not sure off-hand why the existing code tests
targetm.hard_regno_call_part_clobbered when we have
actual_call_used_reg_set.  Seems like it shouldn't be necessary
if we know exactly which registers the function clobbers.
Changing that should probably be a separate follow-on patch though.

>  /* Global registers occurring in the current EBB.  */


> @@ -635,6 +637,7 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
>    rtx link, *link_loc;
>    bool need_curr_point_incr;
>    HARD_REG_SET last_call_used_reg_set;
> +  rtx_insn *call_insn;
>  
>    reg_live_out = df_get_live_out (bb);
>    sparseset_clear (pseudos_live);
> @@ -658,6 +661,17 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
>    if (lra_dump_file != NULL)
>      fprintf (lra_dump_file, "  BB %d\n", bb->index);
>  
> +  call_insn = NULL;
> +  if (targetm.return_call_with_max_clobbers)
> +    {
> +      FOR_BB_INSNS_REVERSE_SAFE (bb, curr_insn, next)
> +	{
> +	  if (CALL_P (curr_insn))
> +	    call_insn = targetm.return_call_with_max_clobbers (curr_insn,
> +							       call_insn);
> +	}
> +    }
> +
>    /* Scan the code of this basic block, noting which pseudos and hard
>       regs are born or die.
>  
> @@ -847,7 +861,8 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
>  	      update_pseudo_point (reg->regno, curr_point, USE_POINT);
>  	      mark_regno_live (reg->regno, reg->biggest_mode);
>  	      check_pseudos_live_through_calls (reg->regno,
> -						last_call_used_reg_set);
> +						last_call_used_reg_set,
> +						call_insn);
>  	    }
>  
>  	  if (!HARD_REGISTER_NUM_P (reg->regno))
> @@ -912,9 +927,13 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
>  		{
>  		  IOR_HARD_REG_SET (lra_reg_info[j].actual_call_used_reg_set,
>  				    this_call_used_reg_set);
> +
> +		  lra_reg_info[j].call_insn = curr_insn;
> +
>  		  if (flush)
> -		    check_pseudos_live_through_calls
> -		      (j, last_call_used_reg_set);
> +		    check_pseudos_live_through_calls (j,
> +						      last_call_used_reg_set,
> +						      call_insn);
>  		}
>  	      COPY_HARD_REG_SET(last_call_used_reg_set, this_call_used_reg_set);
>  	    }
> @@ -956,7 +975,8 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
>  	      update_pseudo_point (reg->regno, curr_point, USE_POINT);
>  	    mark_regno_live (reg->regno, reg->biggest_mode);
>  	    check_pseudos_live_through_calls (reg->regno,
> -					      last_call_used_reg_set);
> +					      last_call_used_reg_set,
> +					      call_insn);
>  	  }
>  
>        for (reg = curr_static_id->hard_regs; reg != NULL; reg = reg->next)
> @@ -1125,7 +1145,7 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
>        if (sparseset_cardinality (pseudos_live_through_calls) == 0)
>  	break;
>        if (sparseset_bit_p (pseudos_live_through_calls, j))
> -	check_pseudos_live_through_calls (j, last_call_used_reg_set);
> +	check_pseudos_live_through_calls (j, last_call_used_reg_set, call_insn);
>      }
>  
>    for (i = 0; HARD_REGISTER_NUM_P (i); ++i)

I think we can do this more accurately by instead keeping track of the
current call during the main block walk and extending this:

	  if (! flag_ipa_ra)
	    COPY_HARD_REG_SET(last_call_used_reg_set, call_used_reg_set);
	  else
	    {

so that we use the "else" when targetm.return_call_with_max_clobbers
is nonnull.  Then we should extend this:

	      bool flush = (! hard_reg_set_empty_p (last_call_used_reg_set)
			    && ! hard_reg_set_equal_p (last_call_used_reg_set,
						       this_call_used_reg_set))

so that we flush when the old call insn preserves more registers
than the new one.

Also, I think:

		  lra_reg_info[j].call_insn = curr_insn;

should happen in check_pseudos_live_through_calls and should apply
targetm.return_call_with_max_clobbers to the register's existing
call_insn (if any), rather than simply overwrite it.  That way the
register info will track the "worst" call for all regions in which
the register is live.

> diff --git a/gcc/regrename.c b/gcc/regrename.c
> index a180ced..109add0 100644
> --- a/gcc/regrename.c
> +++ b/gcc/regrename.c
> @@ -339,9 +339,9 @@ check_new_reg_p (int reg ATTRIBUTE_UNUSED, int new_reg,
>  	 && ! DEBUG_INSN_P (tmp->insn))
>  	|| (this_head->need_caller_save_reg
>  	    && ! (targetm.hard_regno_call_part_clobbered
> -		  (reg, GET_MODE (*tmp->loc)))
> +		  (tmp->insn, reg, GET_MODE (*tmp->loc)))
>  	    && (targetm.hard_regno_call_part_clobbered
> -		(new_reg, GET_MODE (*tmp->loc)))))
> +		(tmp->insn, new_reg, GET_MODE (*tmp->loc)))))
>        return false;
>  
>    return true;


tmp->insn isn't the call we care about here.  I think we should just
pass null.

> diff --git a/gcc/reload1.c b/gcc/reload1.c
> index b703402..5490ae5 100644
> --- a/gcc/reload1.c
> +++ b/gcc/reload1.c
> @@ -8289,7 +8289,8 @@ emit_reload_insns (struct insn_chain *chain)
>  			   : out_regno + k);
>  		      reg_reloaded_insn[regno + k] = insn;
>  		      SET_HARD_REG_BIT (reg_reloaded_valid, regno + k);
> -		      if (targetm.hard_regno_call_part_clobbered (regno + k,
> +		      if (targetm.hard_regno_call_part_clobbered (insn,
> +								  regno + k,
>  								  mode))
>  			SET_HARD_REG_BIT (reg_reloaded_call_part_clobbered,
>  					  regno + k);
> @@ -8369,7 +8370,8 @@ emit_reload_insns (struct insn_chain *chain)
>  			   : in_regno + k);
>  		      reg_reloaded_insn[regno + k] = insn;
>  		      SET_HARD_REG_BIT (reg_reloaded_valid, regno + k);
> -		      if (targetm.hard_regno_call_part_clobbered (regno + k,
> +		      if (targetm.hard_regno_call_part_clobbered (insn,
> +								  regno + k,
>  								  mode))
>  			SET_HARD_REG_BIT (reg_reloaded_call_part_clobbered,
>  					  regno + k);
> @@ -8485,7 +8487,7 @@ emit_reload_insns (struct insn_chain *chain)
>  		      CLEAR_HARD_REG_BIT (reg_reloaded_dead, src_regno + k);
>  		      SET_HARD_REG_BIT (reg_reloaded_valid, src_regno + k);
>  		      if (targetm.hard_regno_call_part_clobbered
> -			  (src_regno + k, mode))
> +			  (insn, src_regno + k, mode))
>  			SET_HARD_REG_BIT (reg_reloaded_call_part_clobbered,
>  					  src_regno + k);
>  		      else

The insns in this case might not be call instructions.  I think for
the legacy reload.c, reload1.c and caller-save.c we can just pass NULL
rather than try to be optimal.

> diff --git a/gcc/sched-deps.c b/gcc/sched-deps.c
> index e15cf08..53c2e26 100644
> --- a/gcc/sched-deps.c
> +++ b/gcc/sched-deps.c
> @@ -3728,7 +3728,8 @@ deps_analyze_insn (struct deps_desc *deps, rtx_insn *insn)
>               Since we only have a choice between 'might be clobbered'
>               and 'definitely not clobbered', we must include all
>               partly call-clobbered registers here.  */
> -	    else if (targetm.hard_regno_call_part_clobbered (i,
> +	    else if (targetm.hard_regno_call_part_clobbered (insn,
> +							     i,
>  							     reg_raw_mode[i])
>                       || TEST_HARD_REG_BIT (regs_invalidated_by_call, i))
>                SET_REGNO_REG_SET (reg_pending_clobbers, i);

No need to split the line after "insn".

> diff --git a/gcc/target.def b/gcc/target.def
> index e8f0f70..ecb0ea7 100644
> --- a/gcc/target.def
> +++ b/gcc/target.def
> @@ -5772,8 +5772,21 @@ return true for a 64-bit mode but false for a 32-bit mode.\n\
>  \n\
>  The default implementation returns false, which is correct\n\
>  for targets that don't have partly call-clobbered registers.",
> - bool, (unsigned int regno, machine_mode mode),
> - hook_bool_uint_mode_false)
> + bool, (rtx_insn *, unsigned int regno, machine_mode mode),
> + hook_bool_insn_uint_mode_false)

Should name the insn parameter and describe it in the docs.
(Realise this is just a first cut.)

> +DEFHOOK
> +(return_call_with_max_clobbers,
> + "This hook returns a pointer to the call that partially clobbers the\n\
> +most registers.  If a platform supports multiple ABIs where the registers\n\
> +that are partially clobbered may vary, this function compares two\n\
> +two calls and return a pointer to the one that clobbers the most registers.\n\

s/two two calls and return/two calls and returns/

Thanks,
Richard
Steve Ellcey Jan. 8, 2019, 11:42 p.m. | #4
On Mon, 2019-01-07 at 17:38 +0000, Richard Sandiford wrote:
> 
> Yeah, this was the kind of thing I had in mind, thanks.

Here is an updated version of the patch.  I bootstrapped and tested
on aarch64 and x86.  I didn't test the other platforms where I changed
the arguments to hard_regno_call_part_clobbered but I think they should
be OK.  I believe I addressed all the issues you brought up.  The ones
I am least confident of are the lra-lives.c changes.  I think they are
right and testing had no regressions, but they are probably the changes
that need to be checked most closely.

Steve Ellcey
sellcey@marvell.com


2019-01-08  Steve Ellcey  <sellcey@marvell.com>

	* config/aarch64/aarch64.c (aarch64_simd_call_p): New function.
	(aarch64_hard_regno_call_part_clobbered): Add insn argument.
	(aarch64_return_call_with_max_clobbers): New function.
	(TARGET_RETURN_CALL_WITH_MAX_CLOBBERS): New macro.
	* config/avr/avr.c (avr_hard_regno_call_part_clobbered): Add insn
	argument.
	* config/i386/i386.c (ix86_hard_regno_call_part_clobbered): Ditto.
	* config/mips/mips.c (mips_hard_regno_call_part_clobbered): Ditto.
	* config/rs6000/rs6000.c (rs6000_hard_regno_call_part_clobbered): Ditto.
	* config/s390/s390.c (s390_hard_regno_call_part_clobbered): Ditto.
	* cselib.c (cselib_process_insn): Add argument to
	targetm.hard_regno_call_part_clobbered call.
	* ira-conflicts.c (ira_build_conflicts): Ditto.
	* ira-costs.c (ira_tune_allocno_costs): Ditto.
	* lra-constraints.c (inherit_reload_reg): Ditto.
	* lra-int.h (struct lra_reg): Add call_insn field.
	* lra-lives.c (check_pseudos_live_through_calls): Add call_insn
	argument.  Call targetm.return_call_with_max_clobbers.
	Add argument to targetm.hard_regno_call_part_clobbered call.
	(process_bb_lives): Use new target function
	targetm.return_call_with_max_clobbers to set call_insn.
	Pass call_insn to check_pseudos_live_through_calls.
	Modify if to check targetm.return_call_with_max_clobbers.
	* lra.c (initialize_lra_reg_info_element): Set call_insn to NULL.
	* regcprop.c (copyprop_hardreg_forward_1): Add argument to
        targetm.hard_regno_call_part_clobbered call.
	* reginfo.c (choose_hard_reg_mode): Ditto.
	* regrename.c (check_new_reg_p): Ditto.
	* reload.c (find_equiv_reg): Ditto.
	* reload1.c (emit_reload_insns): Ditto.
	* sched-deps.c (deps_analyze_insn): Ditto.
	* sel-sched.c (init_regs_for_mode): Ditto.
	(mark_unavailable_hard_regs): Ditto.
	* targhooks.c (default_dwarf_frame_reg_mode): Ditto.
	* target.def (hard_regno_call_part_clobbered): Add insn argument.
	(return_call_with_max_clobbers): New target function.
	* doc/tm.texi: Regenerate.
	* doc/tm.texi.in (TARGET_RETURN_CALL_WITH_MAX_CLOBBERS): New hook.
	* hooks.c (hook_bool_uint_mode_false): Change to
	hook_bool_insn_uint_mode_false.
	* hooks.h (hook_bool_uint_mode_false): Ditto.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 1c45243..2063292 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1644,14 +1644,51 @@ aarch64_reg_save_mode (tree fndecl, unsigned regno)
 	   : (aarch64_simd_decl_p (fndecl) ? E_TFmode : E_DFmode);
 }
 
+/* Return true if the instruction is a call to a SIMD function, false
+   if it is not a SIMD function or if we do not know anything about
+   the function.  */
+
+static bool
+aarch64_simd_call_p (rtx_insn *insn)
+{
+  rtx symbol;
+  rtx call;
+  tree fndecl;
+
+  gcc_assert (CALL_P (insn));
+  call = get_call_rtx_from (insn);
+  symbol = XEXP (XEXP (call, 0), 0);
+  if (GET_CODE (symbol) != SYMBOL_REF)
+    return false;
+  fndecl = SYMBOL_REF_DECL (symbol);
+  if (!fndecl)
+    return false;
+
+  return aarch64_simd_decl_p (fndecl);
+}
+
 /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  The callee only saves
    the lower 64 bits of a 128-bit register.  Tell the compiler the callee
    clobbers the top 64 bits when restoring the bottom 64 bits.  */
 
 static bool
-aarch64_hard_regno_call_part_clobbered (unsigned int regno, machine_mode mode)
+aarch64_hard_regno_call_part_clobbered (rtx_insn *insn, unsigned int regno,
+					machine_mode mode)
 {
-  return FP_REGNUM_P (regno) && maybe_gt (GET_MODE_SIZE (mode), 8);
+  bool simd_p = insn && CALL_P (insn) && aarch64_simd_call_p (insn);
+  return FP_REGNUM_P (regno) && maybe_gt (GET_MODE_SIZE (mode), simd_p ? 16: 8);
+}
+
+/* Implement TARGET_RETURN_CALL_WITH_MAX_CLOBBERS.  */
+
+rtx_insn *
+aarch64_return_call_with_max_clobbers (rtx_insn *call_1, rtx_insn *call_2)
+{
+  gcc_assert (CALL_P (call_1));
+  if (call_2 == NULL_RTX || aarch64_simd_call_p (call_2))
+    return call_1;
+  else
+    return call_2;
 }
 
 /* Implement REGMODE_NATURAL_SIZE.  */
@@ -18764,6 +18801,10 @@ aarch64_libgcc_floating_mode_supported_p
 #define TARGET_HARD_REGNO_CALL_PART_CLOBBERED \
   aarch64_hard_regno_call_part_clobbered
 
+#undef TARGET_RETURN_CALL_WITH_MAX_CLOBBERS
+#define TARGET_RETURN_CALL_WITH_MAX_CLOBBERS \
+  aarch64_return_call_with_max_clobbers
+
 #undef TARGET_CONSTANT_ALIGNMENT
 #define TARGET_CONSTANT_ALIGNMENT aarch64_constant_alignment
 
diff --git a/gcc/config/avr/avr.c b/gcc/config/avr/avr.c
index 023308b..a53b909 100644
--- a/gcc/config/avr/avr.c
+++ b/gcc/config/avr/avr.c
@@ -12181,7 +12181,8 @@ avr_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
 /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  */
 
 static bool
-avr_hard_regno_call_part_clobbered (unsigned regno, machine_mode mode)
+avr_hard_regno_call_part_clobbered (rtx_insn *insn ATTRIBUTE_UNUSED,
+				    unsigned regno, machine_mode mode)
 {
   /* FIXME: This hook gets called with MODE:REGNO combinations that don't
         represent valid hard registers like, e.g. HI:29.  Returning TRUE
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index d01278d..bb2b4c4 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -40156,7 +40156,8 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
    the low 16 bytes are saved.  */
 
 static bool
-ix86_hard_regno_call_part_clobbered (unsigned int regno, machine_mode mode)
+ix86_hard_regno_call_part_clobbered (rtx_insn *insn ATTRIBUTE_UNUSED,
+				     unsigned int regno, machine_mode mode)
 {
   return SSE_REGNO_P (regno) && GET_MODE_SIZE (mode) > 16;
 }
diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 95dc946..a8022b8 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -12906,7 +12906,8 @@ mips_hard_regno_scratch_ok (unsigned int regno)
    registers with MODE > 64 bits are part clobbered too.  */
 
 static bool
-mips_hard_regno_call_part_clobbered (unsigned int regno, machine_mode mode)
+mips_hard_regno_call_part_clobbered (rtx_insn *insn ATTRIBUTE_UNUSED,
+				     unsigned int regno, machine_mode mode)
 {
   if (TARGET_FLOATXX
       && hard_regno_nregs (regno, mode) == 1
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index a257554..09c963b 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -2197,7 +2197,8 @@ rs6000_modes_tieable_p (machine_mode mode1, machine_mode mode2)
 /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  */
 
 static bool
-rs6000_hard_regno_call_part_clobbered (unsigned int regno, machine_mode mode)
+rs6000_hard_regno_call_part_clobbered (rtx_insn *insn ATTRIBUTE_UNUSED,
+				       unsigned int regno, machine_mode mode)
 {
   if (TARGET_32BIT
       && TARGET_POWERPC64
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index ea2be10..6a571a3 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -10098,7 +10098,8 @@ s390_hard_regno_scratch_ok (unsigned int regno)
    bytes are saved across calls, however.  */
 
 static bool
-s390_hard_regno_call_part_clobbered (unsigned int regno, machine_mode mode)
+s390_hard_regno_call_part_clobbered (rtx_insn *insn ATTRIBUTE_UNUSED,
+				     unsigned int regno, machine_mode mode)
 {
   if (!TARGET_64BIT
       && TARGET_ZARCH
diff --git a/gcc/cselib.c b/gcc/cselib.c
index cef4bc0..84c17c2 100644
--- a/gcc/cselib.c
+++ b/gcc/cselib.c
@@ -2770,7 +2770,7 @@ cselib_process_insn (rtx_insn *insn)
 	if (call_used_regs[i]
 	    || (REG_VALUES (i) && REG_VALUES (i)->elt
 		&& (targetm.hard_regno_call_part_clobbered
-		    (i, GET_MODE (REG_VALUES (i)->elt->val_rtx)))))
+		    (insn, i, GET_MODE (REG_VALUES (i)->elt->val_rtx)))))
 	  cselib_invalidate_regno (i, reg_raw_mode[i]);
 
       /* Since it is not clear how cselib is going to be used, be
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 976a700..97a2ade 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -1707,6 +1707,8 @@ of @code{CALL_USED_REGISTERS}.
 @cindex call-saved register
 @hook TARGET_HARD_REGNO_CALL_PART_CLOBBERED
 
+@hook TARGET_RETURN_CALL_WITH_MAX_CLOBBERS
+
 @findex fixed_regs
 @findex call_used_regs
 @findex global_regs
diff --git a/gcc/hooks.c b/gcc/hooks.c
index bbc35fc..f95659b 100644
--- a/gcc/hooks.c
+++ b/gcc/hooks.c
@@ -142,7 +142,7 @@ hook_bool_puint64_puint64_true (poly_uint64, poly_uint64)
 
 /* Generic hook that takes (unsigned int, machine_mode) and returns false.  */
 bool
-hook_bool_uint_mode_false (unsigned int, machine_mode)
+hook_bool_insn_uint_mode_false (rtx_insn *, unsigned int, machine_mode)
 {
   return false;
 }
diff --git a/gcc/hooks.h b/gcc/hooks.h
index 9e4bc29..dc6b2e1 100644
--- a/gcc/hooks.h
+++ b/gcc/hooks.h
@@ -40,7 +40,9 @@ extern bool hook_bool_const_rtx_insn_const_rtx_insn_true (const rtx_insn *,
 extern bool hook_bool_mode_uhwi_false (machine_mode,
 				       unsigned HOST_WIDE_INT);
 extern bool hook_bool_puint64_puint64_true (poly_uint64, poly_uint64);
-extern bool hook_bool_uint_mode_false (unsigned int, machine_mode);
+extern bool hook_bool_insn_uint_mode_false (rtx_insn *,
+					    unsigned int,
+					    machine_mode);
 extern bool hook_bool_uint_mode_true (unsigned int, machine_mode);
 extern bool hook_bool_tree_false (tree);
 extern bool hook_bool_const_tree_false (const_tree);
diff --git a/gcc/ira-conflicts.c b/gcc/ira-conflicts.c
index b57468b..b697e57 100644
--- a/gcc/ira-conflicts.c
+++ b/gcc/ira-conflicts.c
@@ -808,7 +808,8 @@ ira_build_conflicts (void)
 		 regs must conflict with them.  */
 	      for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
 		if (!TEST_HARD_REG_BIT (call_used_reg_set, regno)
-		    && targetm.hard_regno_call_part_clobbered (regno,
+		    && targetm.hard_regno_call_part_clobbered (NULL,
+							       regno,
 							       obj_mode))
 		  {
 		    SET_HARD_REG_BIT (OBJECT_CONFLICT_HARD_REGS (obj), regno);
diff --git a/gcc/ira-costs.c b/gcc/ira-costs.c
index e5d8804..7f60712 100644
--- a/gcc/ira-costs.c
+++ b/gcc/ira-costs.c
@@ -2379,7 +2379,8 @@ ira_tune_allocno_costs (void)
 						   *crossed_calls_clobber_regs)
 		  && (ira_hard_reg_set_intersection_p (regno, mode,
 						       call_used_reg_set)
-		      || targetm.hard_regno_call_part_clobbered (regno,
+		      || targetm.hard_regno_call_part_clobbered (NULL,
+								 regno,
 								 mode)))
 		cost += (ALLOCNO_CALL_FREQ (a)
 			 * (ira_memory_move_cost[mode][rclass][0]
diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index 7ffcd35..bd31a40 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -5377,7 +5377,8 @@ need_for_call_save_p (int regno)
 	       : call_used_reg_set,
 	       PSEUDO_REGNO_MODE (regno), reg_renumber[regno])
 	      || (targetm.hard_regno_call_part_clobbered
-		  (reg_renumber[regno], PSEUDO_REGNO_MODE (regno)))));
+		  (lra_reg_info[regno].call_insn,
+		   reg_renumber[regno], PSEUDO_REGNO_MODE (regno)))));
 }
 
 /* Global registers occurring in the current EBB.  */
diff --git a/gcc/lra-int.h b/gcc/lra-int.h
index 9d9e81d..ccc7b00 100644
--- a/gcc/lra-int.h
+++ b/gcc/lra-int.h
@@ -117,6 +117,8 @@ struct lra_reg
   /* This member is set up in lra-lives.c for subsequent
      assignments.  */
   lra_copy_t copies;
+  /* Call instruction that may affect this register.  */
+  rtx_insn *call_insn;
 };
 
 /* References to the common info about each register.  */
diff --git a/gcc/lra-lives.c b/gcc/lra-lives.c
index 7b60691..0b96891 100644
--- a/gcc/lra-lives.c
+++ b/gcc/lra-lives.c
@@ -579,18 +579,26 @@ lra_setup_reload_pseudo_preferenced_hard_reg (int regno,
    PSEUDOS_LIVE_THROUGH_CALLS and PSEUDOS_LIVE_THROUGH_SETJUMPS.  */
 static inline void
 check_pseudos_live_through_calls (int regno,
-				  HARD_REG_SET last_call_used_reg_set)
+				  HARD_REG_SET last_call_used_reg_set,
+				  rtx_insn *call_insn)
 {
   int hr;
 
+  if (call_insn && CALL_P (call_insn) && targetm.return_call_with_max_clobbers)
+    lra_reg_info[regno].call_insn =
+      targetm.return_call_with_max_clobbers (call_insn,
+					     lra_reg_info[regno].call_insn);
+
   if (! sparseset_bit_p (pseudos_live_through_calls, regno))
     return;
+
   sparseset_clear_bit (pseudos_live_through_calls, regno);
   IOR_HARD_REG_SET (lra_reg_info[regno].conflict_hard_regs,
 		    last_call_used_reg_set);
 
   for (hr = 0; HARD_REGISTER_NUM_P (hr); hr++)
-    if (targetm.hard_regno_call_part_clobbered (hr,
+    if (targetm.hard_regno_call_part_clobbered (call_insn,
+						hr,
 						PSEUDO_REGNO_MODE (regno)))
       add_to_hard_reg_set (&lra_reg_info[regno].conflict_hard_regs,
 			   PSEUDO_REGNO_MODE (regno), hr);
@@ -635,6 +643,7 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
   rtx link, *link_loc;
   bool need_curr_point_incr;
   HARD_REG_SET last_call_used_reg_set;
+  rtx_insn *call_insn;
 
   reg_live_out = df_get_live_out (bb);
   sparseset_clear (pseudos_live);
@@ -658,6 +667,17 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
   if (lra_dump_file != NULL)
     fprintf (lra_dump_file, "  BB %d\n", bb->index);
 
+  call_insn = NULL;
+  if (targetm.return_call_with_max_clobbers)
+    {
+      FOR_BB_INSNS_REVERSE_SAFE (bb, curr_insn, next)
+	{
+	  if (CALL_P (curr_insn))
+	    call_insn = targetm.return_call_with_max_clobbers (curr_insn,
+							       call_insn);
+	}
+    }
+
   /* Scan the code of this basic block, noting which pseudos and hard
      regs are born or die.
 
@@ -847,7 +867,8 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
 	      update_pseudo_point (reg->regno, curr_point, USE_POINT);
 	      mark_regno_live (reg->regno, reg->biggest_mode);
 	      check_pseudos_live_through_calls (reg->regno,
-						last_call_used_reg_set);
+						last_call_used_reg_set,
+						call_insn);
 	    }
 
 	  if (!HARD_REGISTER_NUM_P (reg->regno))
@@ -896,7 +917,7 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
 
       if (call_p)
 	{
-	  if (! flag_ipa_ra)
+	  if (! flag_ipa_ra && ! targetm.return_call_with_max_clobbers)
 	    COPY_HARD_REG_SET(last_call_used_reg_set, call_used_reg_set);
 	  else
 	    {
@@ -912,9 +933,11 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
 		{
 		  IOR_HARD_REG_SET (lra_reg_info[j].actual_call_used_reg_set,
 				    this_call_used_reg_set);
+
 		  if (flush)
-		    check_pseudos_live_through_calls
-		      (j, last_call_used_reg_set);
+		    check_pseudos_live_through_calls (j,
+						      last_call_used_reg_set,
+						      curr_insn);
 		}
 	      COPY_HARD_REG_SET(last_call_used_reg_set, this_call_used_reg_set);
 	    }
@@ -956,7 +979,8 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
 	      update_pseudo_point (reg->regno, curr_point, USE_POINT);
 	    mark_regno_live (reg->regno, reg->biggest_mode);
 	    check_pseudos_live_through_calls (reg->regno,
-					      last_call_used_reg_set);
+					      last_call_used_reg_set,
+					      call_insn);
 	  }
 
       for (reg = curr_static_id->hard_regs; reg != NULL; reg = reg->next)
@@ -1125,7 +1149,7 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
       if (sparseset_cardinality (pseudos_live_through_calls) == 0)
 	break;
       if (sparseset_bit_p (pseudos_live_through_calls, j))
-	check_pseudos_live_through_calls (j, last_call_used_reg_set);
+	check_pseudos_live_through_calls (j, last_call_used_reg_set, call_insn);
     }
 
   for (i = 0; HARD_REGISTER_NUM_P (i); ++i)
diff --git a/gcc/lra.c b/gcc/lra.c
index 75ee742..b0e999f 100644
--- a/gcc/lra.c
+++ b/gcc/lra.c
@@ -1344,6 +1344,7 @@ initialize_lra_reg_info_element (int i)
   lra_reg_info[i].val = get_new_reg_value ();
   lra_reg_info[i].offset = 0;
   lra_reg_info[i].copies = NULL;
+  lra_reg_info[i].call_insn = NULL;
 }
 
 /* Initialize common reg info and copies.  */
diff --git a/gcc/regcprop.c b/gcc/regcprop.c
index b107ea2..e6bdeb0 100644
--- a/gcc/regcprop.c
+++ b/gcc/regcprop.c
@@ -1054,7 +1054,7 @@ copyprop_hardreg_forward_1 (basic_block bb, struct value_data *vd)
 	  for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
 	    if ((TEST_HARD_REG_BIT (regs_invalidated_by_this_call, regno)
 		 || (targetm.hard_regno_call_part_clobbered
-		     (regno, vd->e[regno].mode)))
+		     (insn, regno, vd->e[regno].mode)))
 		&& (regno < set_regno || regno >= set_regno + set_nregs))
 	      kill_value_regno (regno, 1, vd);
 
diff --git a/gcc/reginfo.c b/gcc/reginfo.c
index 7a7fa4d..315c5ec 100644
--- a/gcc/reginfo.c
+++ b/gcc/reginfo.c
@@ -639,7 +639,7 @@ choose_hard_reg_mode (unsigned int regno ATTRIBUTE_UNUSED,
     if (hard_regno_nregs (regno, mode) == nregs
 	&& targetm.hard_regno_mode_ok (regno, mode)
 	&& (!call_saved
-	    || !targetm.hard_regno_call_part_clobbered (regno, mode))
+	    || !targetm.hard_regno_call_part_clobbered (NULL, regno, mode))
 	&& maybe_gt (GET_MODE_SIZE (mode), GET_MODE_SIZE (found_mode)))
       found_mode = mode;
 
@@ -647,7 +647,7 @@ choose_hard_reg_mode (unsigned int regno ATTRIBUTE_UNUSED,
     if (hard_regno_nregs (regno, mode) == nregs
 	&& targetm.hard_regno_mode_ok (regno, mode)
 	&& (!call_saved
-	    || !targetm.hard_regno_call_part_clobbered (regno, mode))
+	    || !targetm.hard_regno_call_part_clobbered (NULL, regno, mode))
 	&& maybe_gt (GET_MODE_SIZE (mode), GET_MODE_SIZE (found_mode)))
       found_mode = mode;
 
@@ -655,7 +655,7 @@ choose_hard_reg_mode (unsigned int regno ATTRIBUTE_UNUSED,
     if (hard_regno_nregs (regno, mode) == nregs
 	&& targetm.hard_regno_mode_ok (regno, mode)
 	&& (!call_saved
-	    || !targetm.hard_regno_call_part_clobbered (regno, mode))
+	    || !targetm.hard_regno_call_part_clobbered (NULL, regno, mode))
 	&& maybe_gt (GET_MODE_SIZE (mode), GET_MODE_SIZE (found_mode)))
       found_mode = mode;
 
@@ -663,7 +663,7 @@ choose_hard_reg_mode (unsigned int regno ATTRIBUTE_UNUSED,
     if (hard_regno_nregs (regno, mode) == nregs
 	&& targetm.hard_regno_mode_ok (regno, mode)
 	&& (!call_saved
-	    || !targetm.hard_regno_call_part_clobbered (regno, mode))
+	    || !targetm.hard_regno_call_part_clobbered (NULL, regno, mode))
 	&& maybe_gt (GET_MODE_SIZE (mode), GET_MODE_SIZE (found_mode)))
       found_mode = mode;
 
@@ -677,7 +677,7 @@ choose_hard_reg_mode (unsigned int regno ATTRIBUTE_UNUSED,
       if (hard_regno_nregs (regno, mode) == nregs
 	  && targetm.hard_regno_mode_ok (regno, mode)
 	  && (!call_saved
-	      || !targetm.hard_regno_call_part_clobbered (regno, mode)))
+	      || !targetm.hard_regno_call_part_clobbered (NULL, regno, mode)))
 	return mode;
     }
 
diff --git a/gcc/regrename.c b/gcc/regrename.c
index a180ced..637b3cb 100644
--- a/gcc/regrename.c
+++ b/gcc/regrename.c
@@ -339,9 +339,9 @@ check_new_reg_p (int reg ATTRIBUTE_UNUSED, int new_reg,
 	 && ! DEBUG_INSN_P (tmp->insn))
 	|| (this_head->need_caller_save_reg
 	    && ! (targetm.hard_regno_call_part_clobbered
-		  (reg, GET_MODE (*tmp->loc)))
+		  (NULL, reg, GET_MODE (*tmp->loc)))
 	    && (targetm.hard_regno_call_part_clobbered
-		(new_reg, GET_MODE (*tmp->loc)))))
+		(NULL, new_reg, GET_MODE (*tmp->loc)))))
       return false;
 
   return true;
diff --git a/gcc/reload.c b/gcc/reload.c
index 6cfd5e2..bff84da 100644
--- a/gcc/reload.c
+++ b/gcc/reload.c
@@ -6912,13 +6912,16 @@ find_equiv_reg (rtx goal, rtx_insn *insn, enum reg_class rclass, int other,
 	  if (regno >= 0 && regno < FIRST_PSEUDO_REGISTER)
 	    for (i = 0; i < nregs; ++i)
 	      if (call_used_regs[regno + i]
-		  || targetm.hard_regno_call_part_clobbered (regno + i, mode))
+		  || targetm.hard_regno_call_part_clobbered (NULL,
+							     regno + i,
+							     mode))
 		return 0;
 
 	  if (valueno >= 0 && valueno < FIRST_PSEUDO_REGISTER)
 	    for (i = 0; i < valuenregs; ++i)
 	      if (call_used_regs[valueno + i]
-		  || targetm.hard_regno_call_part_clobbered (valueno + i,
+		  || targetm.hard_regno_call_part_clobbered (NULL,
+							     valueno + i,
 							     mode))
 		return 0;
 	}
diff --git a/gcc/reload1.c b/gcc/reload1.c
index b703402..66187f6 100644
--- a/gcc/reload1.c
+++ b/gcc/reload1.c
@@ -8289,7 +8289,8 @@ emit_reload_insns (struct insn_chain *chain)
 			   : out_regno + k);
 		      reg_reloaded_insn[regno + k] = insn;
 		      SET_HARD_REG_BIT (reg_reloaded_valid, regno + k);
-		      if (targetm.hard_regno_call_part_clobbered (regno + k,
+		      if (targetm.hard_regno_call_part_clobbered (NULL,
+								  regno + k,
 								  mode))
 			SET_HARD_REG_BIT (reg_reloaded_call_part_clobbered,
 					  regno + k);
@@ -8369,7 +8370,8 @@ emit_reload_insns (struct insn_chain *chain)
 			   : in_regno + k);
 		      reg_reloaded_insn[regno + k] = insn;
 		      SET_HARD_REG_BIT (reg_reloaded_valid, regno + k);
-		      if (targetm.hard_regno_call_part_clobbered (regno + k,
+		      if (targetm.hard_regno_call_part_clobbered (NULL,
+								  regno + k,
 								  mode))
 			SET_HARD_REG_BIT (reg_reloaded_call_part_clobbered,
 					  regno + k);
@@ -8485,7 +8487,7 @@ emit_reload_insns (struct insn_chain *chain)
 		      CLEAR_HARD_REG_BIT (reg_reloaded_dead, src_regno + k);
 		      SET_HARD_REG_BIT (reg_reloaded_valid, src_regno + k);
 		      if (targetm.hard_regno_call_part_clobbered
-			  (src_regno + k, mode))
+			  (NULL, src_regno + k, mode))
 			SET_HARD_REG_BIT (reg_reloaded_call_part_clobbered,
 					  src_regno + k);
 		      else
diff --git a/gcc/sched-deps.c b/gcc/sched-deps.c
index e15cf08..14f0b66 100644
--- a/gcc/sched-deps.c
+++ b/gcc/sched-deps.c
@@ -3728,7 +3728,7 @@ deps_analyze_insn (struct deps_desc *deps, rtx_insn *insn)
              Since we only have a choice between 'might be clobbered'
              and 'definitely not clobbered', we must include all
              partly call-clobbered registers here.  */
-	    else if (targetm.hard_regno_call_part_clobbered (i,
+	    else if (targetm.hard_regno_call_part_clobbered (insn, i,
 							     reg_raw_mode[i])
                      || TEST_HARD_REG_BIT (regs_invalidated_by_call, i))
               SET_REGNO_REG_SET (reg_pending_clobbers, i);
diff --git a/gcc/sel-sched.c b/gcc/sel-sched.c
index 2bae6ef..c6b4593 100644
--- a/gcc/sel-sched.c
+++ b/gcc/sel-sched.c
@@ -1102,7 +1102,7 @@ init_regs_for_mode (machine_mode mode)
       if (i >= 0)
         continue;
 
-      if (targetm.hard_regno_call_part_clobbered (cur_reg, mode))
+      if (targetm.hard_regno_call_part_clobbered (NULL, cur_reg, mode))
         SET_HARD_REG_BIT (sel_hrd.regs_for_call_clobbered[mode],
                           cur_reg);
 
@@ -1251,7 +1251,7 @@ mark_unavailable_hard_regs (def_t def, struct reg_rename *reg_rename_p,
 
   /* Exclude registers that are partially call clobbered.  */
   if (def->crosses_call
-      && !targetm.hard_regno_call_part_clobbered (regno, mode))
+      && !targetm.hard_regno_call_part_clobbered (NULL, regno, mode))
     AND_COMPL_HARD_REG_SET (reg_rename_p->available_for_renaming,
                             sel_hrd.regs_for_call_clobbered[mode]);
 
diff --git a/gcc/target.def b/gcc/target.def
index e8f0f70..839d0b9 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -5766,14 +5766,29 @@ DEFHOOK
 (hard_regno_call_part_clobbered,
  "This hook should return true if @var{regno} is partly call-saved and\n\
 partly call-clobbered, and if a value of mode @var{mode} would be partly\n\
-clobbered by a call.  For example, if the low 32 bits of @var{regno} are\n\
-preserved across a call but higher bits are clobbered, this hook should\n\
-return true for a 64-bit mode but false for a 32-bit mode.\n\
+clobbered by the @var{insn} call.  If @var{insn} is NULL then it should\n\
+return true if any call could partly clobber the register.  For example,\n\
+if the low 32 bits of @var{regno} are preserved across a call but higher\n\
+bits are clobbered, this hook should return true for a 64-bit mode but\n\
+false for a 32-bit mode.\n\
 \n\
 The default implementation returns false, which is correct\n\
 for targets that don't have partly call-clobbered registers.",
- bool, (unsigned int regno, machine_mode mode),
- hook_bool_uint_mode_false)
+ bool, (rtx_insn *insn, unsigned int regno, machine_mode mode),
+ hook_bool_insn_uint_mode_false)
+
+DEFHOOK
+(return_call_with_max_clobbers,
+ "This hook returns a pointer to the call that partially clobbers the\n\
+most registers.  If a platform supports multiple ABIs where the registers\n\
+that are partially clobbered may vary, this function compares two\n\
+calls and returns a pointer to the one that clobbers the most registers.\n\
+\n\
+The registers clobbered in different ABIs must be a proper subset or\n\
+superset of all other ABIs.  @var{call_1} must always be a call insn,\n\
+call_2 may be NULL or a call insn.",
+ rtx_insn *, (rtx_insn *call_1, rtx_insn *call_2),
+ NULL)
 
 /* Return the smallest number of different values for which it is best to
    use a jump-table instead of a tree of conditional branches.  */
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 898848f..2cbdc4a 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -1930,7 +1930,7 @@ default_dwarf_frame_reg_mode (int regno)
 {
   machine_mode save_mode = reg_raw_mode[regno];
 
-  if (targetm.hard_regno_call_part_clobbered (regno, save_mode))
+  if (targetm.hard_regno_call_part_clobbered (NULL, regno, save_mode))
     save_mode = choose_hard_reg_mode (regno, 1, true);
   return save_mode;
 }
Richard Sandiford Jan. 9, 2019, 10 a.m. | #5
Steve Ellcey <sellcey@marvell.com> writes:
>  /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  The callee only saves
>     the lower 64 bits of a 128-bit register.  Tell the compiler the callee
>     clobbers the top 64 bits when restoring the bottom 64 bits.  */
>  
>  static bool
> -aarch64_hard_regno_call_part_clobbered (unsigned int regno, machine_mode mode)
> +aarch64_hard_regno_call_part_clobbered (rtx_insn *insn, unsigned int regno,
> +					machine_mode mode)
>  {
> -  return FP_REGNUM_P (regno) && maybe_gt (GET_MODE_SIZE (mode), 8);
> +  bool simd_p = insn && CALL_P (insn) && aarch64_simd_call_p (insn);
> +  return FP_REGNUM_P (regno) && maybe_gt (GET_MODE_SIZE (mode), simd_p ? 16: 8);

Should be a space before the ":" (which pushes the line over 80 chars).

> diff --git a/gcc/hooks.h b/gcc/hooks.h
> index 9e4bc29..dc6b2e1 100644
> --- a/gcc/hooks.h
> +++ b/gcc/hooks.h
> @@ -40,7 +40,9 @@ extern bool hook_bool_const_rtx_insn_const_rtx_insn_true (const rtx_insn *,
>  extern bool hook_bool_mode_uhwi_false (machine_mode,
>  				       unsigned HOST_WIDE_INT);
>  extern bool hook_bool_puint64_puint64_true (poly_uint64, poly_uint64);
> -extern bool hook_bool_uint_mode_false (unsigned int, machine_mode);
> +extern bool hook_bool_insn_uint_mode_false (rtx_insn *,
> +					    unsigned int,
> +					    machine_mode);

No need to break the line after "rtx_insn *,".

>  extern bool hook_bool_uint_mode_true (unsigned int, machine_mode);
>  extern bool hook_bool_tree_false (tree);
>  extern bool hook_bool_const_tree_false (const_tree);
> diff --git a/gcc/ira-conflicts.c b/gcc/ira-conflicts.c
> index b57468b..b697e57 100644
> --- a/gcc/ira-conflicts.c
> +++ b/gcc/ira-conflicts.c
> @@ -808,7 +808,8 @@ ira_build_conflicts (void)
>  		 regs must conflict with them.  */
>  	      for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>  		if (!TEST_HARD_REG_BIT (call_used_reg_set, regno)
> -		    && targetm.hard_regno_call_part_clobbered (regno,
> +		    && targetm.hard_regno_call_part_clobbered (NULL,
> +							       regno,
>  							       obj_mode))
>  		  {
>  		    SET_HARD_REG_BIT (OBJECT_CONFLICT_HARD_REGS (obj), regno);

No need to break the line after "NULL,".

> diff --git a/gcc/ira-costs.c b/gcc/ira-costs.c
> index e5d8804..7f60712 100644
> --- a/gcc/ira-costs.c
> +++ b/gcc/ira-costs.c
> @@ -2379,7 +2379,8 @@ ira_tune_allocno_costs (void)
>  						   *crossed_calls_clobber_regs)
>  		  && (ira_hard_reg_set_intersection_p (regno, mode,
>  						       call_used_reg_set)
> -		      || targetm.hard_regno_call_part_clobbered (regno,
> +		      || targetm.hard_regno_call_part_clobbered (NULL,
> +								 regno,
>  								 mode)))
>  		cost += (ALLOCNO_CALL_FREQ (a)
>  			 * (ira_memory_move_cost[mode][rclass][0]

Same here.

> diff --git a/gcc/lra-int.h b/gcc/lra-int.h
> index 9d9e81d..ccc7b00 100644
> --- a/gcc/lra-int.h
> +++ b/gcc/lra-int.h
> @@ -117,6 +117,8 @@ struct lra_reg
>    /* This member is set up in lra-lives.c for subsequent
>       assignments.  */
>    lra_copy_t copies;
> +  /* Call instruction that may affect this register.  */
> +  rtx_insn *call_insn;
>  };
>  
>  /* References to the common info about each register.  */

If we do this right, I think the new field should be able to replace call_p.
The pseudo crosses a call iff call_insn is nonnull.

I think the field belongs after:

  poly_int64 offset;

since it comes under:

  /* The following fields are defined only for pseudos.	 */

rather than:

  /* These members are set up in lra-lives.c and updated in
     lra-coalesce.c.  */

> diff --git a/gcc/lra-lives.c b/gcc/lra-lives.c
> index 7b60691..0b96891 100644
> --- a/gcc/lra-lives.c
> +++ b/gcc/lra-lives.c
> @@ -579,18 +579,26 @@ lra_setup_reload_pseudo_preferenced_hard_reg (int regno,
>     PSEUDOS_LIVE_THROUGH_CALLS and PSEUDOS_LIVE_THROUGH_SETJUMPS.  */
>  static inline void
>  check_pseudos_live_through_calls (int regno,
> -				  HARD_REG_SET last_call_used_reg_set)
> +				  HARD_REG_SET last_call_used_reg_set,
> +				  rtx_insn *call_insn)
>  {
>    int hr;
>  
> +  if (call_insn && CALL_P (call_insn) && targetm.return_call_with_max_clobbers)
> +    lra_reg_info[regno].call_insn =
> +      targetm.return_call_with_max_clobbers (call_insn,
> +					     lra_reg_info[regno].call_insn);
> +

This should happen...

>    if (! sparseset_bit_p (pseudos_live_through_calls, regno))
>      return;

...here, where we know that regno is live across a call like that
described by call_insn.

call_insn should be nonnull and a CALL_P in that case.  We should assert
for that regardless of whether targetm.return_call_with_max_clobbers is
nonnull.

The update should be something like:

  rtx_insn *old_call_insn = lra_reg_info[regno].call_insn;
  if (!old_call_insn
      || (targetm.return_call_with_max_clobbers
	  && targetm.return_call_with_max_clobbers (old_call_insn
						    call_insn) == call_insn))
    lra_reg_info[regno].call_insn = call_insn;

>    sparseset_clear_bit (pseudos_live_through_calls, regno);
>    IOR_HARD_REG_SET (lra_reg_info[regno].conflict_hard_regs,
>  		    last_call_used_reg_set);
>  
>    for (hr = 0; HARD_REGISTER_NUM_P (hr); hr++)
> -    if (targetm.hard_regno_call_part_clobbered (hr,
> +    if (targetm.hard_regno_call_part_clobbered (call_insn,
> +						hr,
>  						PSEUDO_REGNO_MODE (regno)))
>        add_to_hard_reg_set (&lra_reg_info[regno].conflict_hard_regs,
>  			   PSEUDO_REGNO_MODE (regno), hr);

No need to break the line after "call_insn,".

> @@ -658,6 +667,17 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
>    if (lra_dump_file != NULL)
>      fprintf (lra_dump_file, "  BB %d\n", bb->index);
>  
> +  call_insn = NULL;
> +  if (targetm.return_call_with_max_clobbers)
> +    {
> +      FOR_BB_INSNS_REVERSE_SAFE (bb, curr_insn, next)
> +	{
> +	  if (CALL_P (curr_insn))
> +	    call_insn = targetm.return_call_with_max_clobbers (curr_insn,
> +							       call_insn);
> +	}
> +    }
> +
>    /* Scan the code of this basic block, noting which pseudos and hard
>       regs are born or die.
>  

What I meant about keeping track of the current call during the main
block walk is that we shouldn't have this loop.  Instead we should
update call_insn when we see a CALL_P...

> @@ -847,7 +867,8 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
>  	      update_pseudo_point (reg->regno, curr_point, USE_POINT);
>  	      mark_regno_live (reg->regno, reg->biggest_mode);
>  	      check_pseudos_live_through_calls (reg->regno,
> -						last_call_used_reg_set);
> +						last_call_used_reg_set,
> +						call_insn);
>  	    }
>  
>  	  if (!HARD_REGISTER_NUM_P (reg->regno))
> @@ -896,7 +917,7 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
>  
>        if (call_p)
>  	{
> -	  if (! flag_ipa_ra)
> +	  if (! flag_ipa_ra && ! targetm.return_call_with_max_clobbers)
>  	    COPY_HARD_REG_SET(last_call_used_reg_set, call_used_reg_set);
>  	  else
>  	    {

...here.  In the "if" arm we can set call_insn to the current instruction,
because any call will do.  In the "else" arm we should extend this:

	      bool flush = (! hard_reg_set_empty_p (last_call_used_reg_set)
			    && ! hard_reg_set_equal_p (last_call_used_reg_set,
						       this_call_used_reg_set))

so that we also flush when the old call insn part-clobbers a
different set of registers.

The idea is that we're trying to do something similar to -fipa-ra:
the registers clobbered by a call depend on the call_insn.  -fipa-ra
does that by dividing the block into regions with the same call
behaviour.  Then:

- pseudos_live_through_calls says which registers are live across
  a call in the current region

- last_call_used_reg_set describes the set of registers that are
  clobbered by calls in the current region

A new region starts whenever we find a call instruction that clobbers a
different set of registers from those in last_call_used_reg_set (whether
that's more registers or fewer).

We want to divide the block into regions in a similar way for
return_call_with_max_clobbers.  A new region starts whenever the
new call preserves a different set of registers from the original call.
This can be tested by something like:

/* Return true if call instructions CALL1 and CALL2 use ABIs that
   preserve the same set of registers.  */

static bool
calls_have_same_clobbers_p (rtx_insn *call1, rtx_insn *call2)
{
  if (!targetm.return_call_with_max_clobbers)
    return false;

  return (targetm.return_call_with_max_clobbers (call1, call2) == call1
	  && targetm.return_call_with_max_clobbers (call2, call1) == call2);
}

I think it would make sense for return_call_with_max_clobbers to only
handle nonnull isnsn (I probably said otherwise earlier, sorry).

> diff --git a/gcc/reload.c b/gcc/reload.c
> index 6cfd5e2..bff84da 100644
> --- a/gcc/reload.c
> +++ b/gcc/reload.c
> @@ -6912,13 +6912,16 @@ find_equiv_reg (rtx goal, rtx_insn *insn, enum reg_class rclass, int other,
>  	  if (regno >= 0 && regno < FIRST_PSEUDO_REGISTER)
>  	    for (i = 0; i < nregs; ++i)
>  	      if (call_used_regs[regno + i]
> -		  || targetm.hard_regno_call_part_clobbered (regno + i, mode))
> +		  || targetm.hard_regno_call_part_clobbered (NULL,
> +							     regno + i,
> +							     mode))
>  		return 0;
>  
>  	  if (valueno >= 0 && valueno < FIRST_PSEUDO_REGISTER)
>  	    for (i = 0; i < valuenregs; ++i)
>  	      if (call_used_regs[valueno + i]
> -		  || targetm.hard_regno_call_part_clobbered (valueno + i,
> +		  || targetm.hard_regno_call_part_clobbered (NULL,
> +							     valueno + i,
>  							     mode))
>  		return 0;
>  	}

No need to split the lines after "NULL,".

> diff --git a/gcc/target.def b/gcc/target.def
> index e8f0f70..839d0b9 100644
> --- a/gcc/target.def
> +++ b/gcc/target.def
> @@ -5766,14 +5766,29 @@ DEFHOOK
>  (hard_regno_call_part_clobbered,
>   "This hook should return true if @var{regno} is partly call-saved and\n\
>  partly call-clobbered, and if a value of mode @var{mode} would be partly\n\
> -clobbered by a call.  For example, if the low 32 bits of @var{regno} are\n\
> -preserved across a call but higher bits are clobbered, this hook should\n\
> -return true for a 64-bit mode but false for a 32-bit mode.\n\
> +clobbered by the @var{insn} call.  If @var{insn} is NULL then it should\n\

maybe "by call instruction @var{insn}".

Thanks,
Richard
Steve Ellcey Jan. 10, 2019, 12:19 a.m. | #6
On Wed, 2019-01-09 at 10:00 +0000, Richard Sandiford wrote:

Thanks for the quick turnaround on the comments Richard.  Here is a new
version where I tried to address all the issues you raised.  One thing
I noticed is that I think your calls_have_same_clobbers_p function only
works if, when return_call_with_max_clobbers is called with two calls
that clobber the same set of registers, it always returns the first
call.

I don't think my original function had that guarantee but I changed it 
so that it would and documented that requirement in target.def.  I
couldn't see a better way to implement the calls_have_same_clobbers_p
function other than doing that.

Steve Ellcey
sellcey@marvell.com


2019-01-09  Steve Ellcey  <sellcey@marvell.com>

	* config/aarch64/aarch64.c (aarch64_simd_call_p): New function.
	(aarch64_hard_regno_call_part_clobbered): Add insn argument.
	(aarch64_return_call_with_max_clobbers): New function.
	(TARGET_RETURN_CALL_WITH_MAX_CLOBBERS): New macro.
	* config/avr/avr.c (avr_hard_regno_call_part_clobbered): Add insn
	argument.
	* config/i386/i386.c (ix86_hard_regno_call_part_clobbered): Ditto.
	* config/mips/mips.c (mips_hard_regno_call_part_clobbered): Ditto.
	* config/rs6000/rs6000.c (rs6000_hard_regno_call_part_clobbered): Ditto.
	* config/s390/s390.c (s390_hard_regno_call_part_clobbered): Ditto.
	* cselib.c (cselib_process_insn): Add argument to
	targetm.hard_regno_call_part_clobbered call.
	* ira-conflicts.c (ira_build_conflicts): Ditto.
	* ira-costs.c (ira_tune_allocno_costs): Ditto.
	* lra-constraints.c (inherit_reload_reg): Ditto.
	* lra-int.h (struct lra_reg): Add call_insn field, remove call_p field.
	* lra-lives.c (check_pseudos_live_through_calls): Add call_insn
	argument.  Call targetm.return_call_with_max_clobbers.
	Add argument to targetm.hard_regno_call_part_clobbered call.
	(calls_have_same_clobbers_p): New function.
	(process_bb_lives): Add call_insn and last_call_insn variables.
	Pass call_insn to check_pseudos_live_through_calls.
	Modify if stmt to check targetm.return_call_with_max_clobbers.
	Update setting of flush variable.
	* lra.c (initialize_lra_reg_info_element): Set call_insn to NULL.
	* regcprop.c (copyprop_hardreg_forward_1): Add argument to
        targetm.hard_regno_call_part_clobbered call.
	* reginfo.c (choose_hard_reg_mode): Ditto.
	* regrename.c (check_new_reg_p): Ditto.
	* reload.c (find_equiv_reg): Ditto.
	* reload1.c (emit_reload_insns): Ditto.
	* sched-deps.c (deps_analyze_insn): Ditto.
	* sel-sched.c (init_regs_for_mode): Ditto.
	(mark_unavailable_hard_regs): Ditto.
	* targhooks.c (default_dwarf_frame_reg_mode): Ditto.
	* target.def (hard_regno_call_part_clobbered): Add insn argument.
	(return_call_with_max_clobbers): New target function.
	* doc/tm.texi: Regenerate.
	* doc/tm.texi.in (TARGET_RETURN_CALL_WITH_MAX_CLOBBERS): New hook.
	* hooks.c (hook_bool_uint_mode_false): Change to
	hook_bool_insn_uint_mode_false.
	* hooks.h (hook_bool_uint_mode_false): Ditto.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 1c300af..d88be6c 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1655,14 +1655,56 @@ aarch64_reg_save_mode (tree fndecl, unsigned regno)
 	   : (aarch64_simd_decl_p (fndecl) ? E_TFmode : E_DFmode);
 }
 
+/* Return true if the instruction is a call to a SIMD function, false
+   if it is not a SIMD function or if we do not know anything about
+   the function.  */
+
+static bool
+aarch64_simd_call_p (rtx_insn *insn)
+{
+  rtx symbol;
+  rtx call;
+  tree fndecl;
+
+  gcc_assert (CALL_P (insn));
+  call = get_call_rtx_from (insn);
+  symbol = XEXP (XEXP (call, 0), 0);
+  if (GET_CODE (symbol) != SYMBOL_REF)
+    return false;
+  fndecl = SYMBOL_REF_DECL (symbol);
+  if (!fndecl)
+    return false;
+
+  return aarch64_simd_decl_p (fndecl);
+}
+
 /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  The callee only saves
    the lower 64 bits of a 128-bit register.  Tell the compiler the callee
    clobbers the top 64 bits when restoring the bottom 64 bits.  */
 
 static bool
-aarch64_hard_regno_call_part_clobbered (unsigned int regno, machine_mode mode)
+aarch64_hard_regno_call_part_clobbered (rtx_insn *insn, unsigned int regno,
+					machine_mode mode)
 {
-  return FP_REGNUM_P (regno) && maybe_gt (GET_MODE_SIZE (mode), 8);
+  bool simd_p = insn && CALL_P (insn) && aarch64_simd_call_p (insn);
+  return FP_REGNUM_P (regno)
+	 && maybe_gt (GET_MODE_SIZE (mode), simd_p ? 16 : 8);
+}
+
+/* Implement TARGET_RETURN_CALL_WITH_MAX_CLOBBERS.  */
+
+rtx_insn *
+aarch64_return_call_with_max_clobbers (rtx_insn *call_1, rtx_insn *call_2)
+{
+  gcc_assert (CALL_P (call_1) && CALL_P (call_2));
+
+  if (aarch64_simd_call_p (call_1) == aarch64_simd_call_p (call_2))
+    return call_1;
+
+  if (aarch64_simd_call_p (call_2))
+    return call_1;
+  else
+    return call_2;
 }
 
 /* Implement REGMODE_NATURAL_SIZE.  */
@@ -18825,6 +18867,10 @@ aarch64_libgcc_floating_mode_supported_p
 #define TARGET_HARD_REGNO_CALL_PART_CLOBBERED \
   aarch64_hard_regno_call_part_clobbered
 
+#undef TARGET_RETURN_CALL_WITH_MAX_CLOBBERS
+#define TARGET_RETURN_CALL_WITH_MAX_CLOBBERS \
+  aarch64_return_call_with_max_clobbers
+
 #undef TARGET_CONSTANT_ALIGNMENT
 #define TARGET_CONSTANT_ALIGNMENT aarch64_constant_alignment
 
diff --git a/gcc/config/avr/avr.c b/gcc/config/avr/avr.c
index 023308b..a53b909 100644
--- a/gcc/config/avr/avr.c
+++ b/gcc/config/avr/avr.c
@@ -12181,7 +12181,8 @@ avr_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
 /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  */
 
 static bool
-avr_hard_regno_call_part_clobbered (unsigned regno, machine_mode mode)
+avr_hard_regno_call_part_clobbered (rtx_insn *insn ATTRIBUTE_UNUSED,
+				    unsigned regno, machine_mode mode)
 {
   /* FIXME: This hook gets called with MODE:REGNO combinations that don't
         represent valid hard registers like, e.g. HI:29.  Returning TRUE
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 0e23eaa..1bb535a 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -40216,7 +40216,8 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
    the low 16 bytes are saved.  */
 
 static bool
-ix86_hard_regno_call_part_clobbered (unsigned int regno, machine_mode mode)
+ix86_hard_regno_call_part_clobbered (rtx_insn *insn ATTRIBUTE_UNUSED,
+				     unsigned int regno, machine_mode mode)
 {
   return SSE_REGNO_P (regno) && GET_MODE_SIZE (mode) > 16;
 }
diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 95dc946..a8022b8 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -12906,7 +12906,8 @@ mips_hard_regno_scratch_ok (unsigned int regno)
    registers with MODE > 64 bits are part clobbered too.  */
 
 static bool
-mips_hard_regno_call_part_clobbered (unsigned int regno, machine_mode mode)
+mips_hard_regno_call_part_clobbered (rtx_insn *insn ATTRIBUTE_UNUSED,
+				     unsigned int regno, machine_mode mode)
 {
   if (TARGET_FLOATXX
       && hard_regno_nregs (regno, mode) == 1
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 0357dc8..3330b68 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -2197,7 +2197,8 @@ rs6000_modes_tieable_p (machine_mode mode1, machine_mode mode2)
 /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  */
 
 static bool
-rs6000_hard_regno_call_part_clobbered (unsigned int regno, machine_mode mode)
+rs6000_hard_regno_call_part_clobbered (rtx_insn *insn ATTRIBUTE_UNUSED,
+				       unsigned int regno, machine_mode mode)
 {
   if (TARGET_32BIT
       && TARGET_POWERPC64
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index ea2be10..6a571a3 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -10098,7 +10098,8 @@ s390_hard_regno_scratch_ok (unsigned int regno)
    bytes are saved across calls, however.  */
 
 static bool
-s390_hard_regno_call_part_clobbered (unsigned int regno, machine_mode mode)
+s390_hard_regno_call_part_clobbered (rtx_insn *insn ATTRIBUTE_UNUSED,
+				     unsigned int regno, machine_mode mode)
 {
   if (!TARGET_64BIT
       && TARGET_ZARCH
diff --git a/gcc/cselib.c b/gcc/cselib.c
index cef4bc0..84c17c2 100644
--- a/gcc/cselib.c
+++ b/gcc/cselib.c
@@ -2770,7 +2770,7 @@ cselib_process_insn (rtx_insn *insn)
 	if (call_used_regs[i]
 	    || (REG_VALUES (i) && REG_VALUES (i)->elt
 		&& (targetm.hard_regno_call_part_clobbered
-		    (i, GET_MODE (REG_VALUES (i)->elt->val_rtx)))))
+		    (insn, i, GET_MODE (REG_VALUES (i)->elt->val_rtx)))))
 	  cselib_invalidate_regno (i, reg_raw_mode[i]);
 
       /* Since it is not clear how cselib is going to be used, be
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 976a700..97a2ade 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -1707,6 +1707,8 @@ of @code{CALL_USED_REGISTERS}.
 @cindex call-saved register
 @hook TARGET_HARD_REGNO_CALL_PART_CLOBBERED
 
+@hook TARGET_RETURN_CALL_WITH_MAX_CLOBBERS
+
 @findex fixed_regs
 @findex call_used_regs
 @findex global_regs
diff --git a/gcc/hooks.c b/gcc/hooks.c
index bbc35fc..f95659b 100644
--- a/gcc/hooks.c
+++ b/gcc/hooks.c
@@ -142,7 +142,7 @@ hook_bool_puint64_puint64_true (poly_uint64, poly_uint64)
 
 /* Generic hook that takes (unsigned int, machine_mode) and returns false.  */
 bool
-hook_bool_uint_mode_false (unsigned int, machine_mode)
+hook_bool_insn_uint_mode_false (rtx_insn *, unsigned int, machine_mode)
 {
   return false;
 }
diff --git a/gcc/hooks.h b/gcc/hooks.h
index 9e4bc29..0bc8117 100644
--- a/gcc/hooks.h
+++ b/gcc/hooks.h
@@ -40,7 +40,8 @@ extern bool hook_bool_const_rtx_insn_const_rtx_insn_true (const rtx_insn *,
 extern bool hook_bool_mode_uhwi_false (machine_mode,
 				       unsigned HOST_WIDE_INT);
 extern bool hook_bool_puint64_puint64_true (poly_uint64, poly_uint64);
-extern bool hook_bool_uint_mode_false (unsigned int, machine_mode);
+extern bool hook_bool_insn_uint_mode_false (rtx_insn *, unsigned int,
+					    machine_mode);
 extern bool hook_bool_uint_mode_true (unsigned int, machine_mode);
 extern bool hook_bool_tree_false (tree);
 extern bool hook_bool_const_tree_false (const_tree);
diff --git a/gcc/ira-conflicts.c b/gcc/ira-conflicts.c
index 7ec709d..5bd6c0c 100644
--- a/gcc/ira-conflicts.c
+++ b/gcc/ira-conflicts.c
@@ -808,7 +808,7 @@ ira_build_conflicts (void)
 		 regs must conflict with them.  */
 	      for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
 		if (!TEST_HARD_REG_BIT (call_used_reg_set, regno)
-		    && targetm.hard_regno_call_part_clobbered (regno,
+		    && targetm.hard_regno_call_part_clobbered (NULL, regno,
 							       obj_mode))
 		  {
 		    SET_HARD_REG_BIT (OBJECT_CONFLICT_HARD_REGS (obj), regno);
diff --git a/gcc/ira-costs.c b/gcc/ira-costs.c
index 0ca70a0..a17dae3 100644
--- a/gcc/ira-costs.c
+++ b/gcc/ira-costs.c
@@ -2379,7 +2379,7 @@ ira_tune_allocno_costs (void)
 						   *crossed_calls_clobber_regs)
 		  && (ira_hard_reg_set_intersection_p (regno, mode,
 						       call_used_reg_set)
-		      || targetm.hard_regno_call_part_clobbered (regno,
+		      || targetm.hard_regno_call_part_clobbered (NULL, regno,
 								 mode)))
 		cost += (ALLOCNO_CALL_FREQ (a)
 			 * (ira_memory_move_cost[mode][rclass][0]
diff --git a/gcc/lra-assigns.c b/gcc/lra-assigns.c
index 1312804..cb4a873 100644
--- a/gcc/lra-assigns.c
+++ b/gcc/lra-assigns.c
@@ -1621,7 +1621,7 @@ lra_assign (bool &fails_p)
        asm is removed and it can result in incorrect allocation.  */
     for (i = FIRST_PSEUDO_REGISTER; i < max_regno; i++)
       if (lra_reg_info[i].nrefs != 0 && reg_renumber[i] >= 0
-	  && lra_reg_info[i].call_p
+	  && lra_reg_info[i].call_insn
 	  && overlaps_hard_reg_set_p (call_used_reg_set,
 				      PSEUDO_REGNO_MODE (i), reg_renumber[i]))
 	gcc_unreachable ();
diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index 4f434e5..1440451 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -5377,7 +5377,8 @@ need_for_call_save_p (int regno)
 	       : call_used_reg_set,
 	       PSEUDO_REGNO_MODE (regno), reg_renumber[regno])
 	      || (targetm.hard_regno_call_part_clobbered
-		  (reg_renumber[regno], PSEUDO_REGNO_MODE (regno)))));
+		  (lra_reg_info[regno].call_insn,
+		   reg_renumber[regno], PSEUDO_REGNO_MODE (regno)))));
 }
 
 /* Global registers occurring in the current EBB.  */
diff --git a/gcc/lra-int.h b/gcc/lra-int.h
index 9d9e81d..d0a8fac 100644
--- a/gcc/lra-int.h
+++ b/gcc/lra-int.h
@@ -91,10 +91,6 @@ struct lra_reg
   /* True if the pseudo should not be assigned to a stack register.  */
   bool no_stack_p;
 #endif
-  /* True if the pseudo crosses a call.	 It is setup in lra-lives.c
-     and used to check that the pseudo crossing a call did not get a
-     call used hard register.  */
-  bool call_p;
   /* Number of references and execution frequencies of the register in
      *non-debug* insns.	 */
   int nrefs, freq;
@@ -107,6 +103,8 @@ struct lra_reg
   int val;
   /* Offset from relative eliminate register to pesudo reg.  */
   poly_int64 offset;
+  /* Call instruction, if any, that may affect this psuedo reg.  */
+  rtx_insn *call_insn;
   /* These members are set up in lra-lives.c and updated in
      lra-coalesce.c.  */
   /* The biggest size mode in which each pseudo reg is referred in
diff --git a/gcc/lra-lives.c b/gcc/lra-lives.c
index a00ec38..61149e1 100644
--- a/gcc/lra-lives.c
+++ b/gcc/lra-lives.c
@@ -579,22 +579,32 @@ lra_setup_reload_pseudo_preferenced_hard_reg (int regno,
    PSEUDOS_LIVE_THROUGH_CALLS and PSEUDOS_LIVE_THROUGH_SETJUMPS.  */
 static inline void
 check_pseudos_live_through_calls (int regno,
-				  HARD_REG_SET last_call_used_reg_set)
+				  HARD_REG_SET last_call_used_reg_set,
+				  rtx_insn *call_insn)
 {
   int hr;
+  rtx_insn *old_call_insn;
 
   if (! sparseset_bit_p (pseudos_live_through_calls, regno))
     return;
+
+  gcc_assert (call_insn && CALL_P (call_insn));
+  old_call_insn = lra_reg_info[regno].call_insn;
+  if (!old_call_insn
+      || (targetm.return_call_with_max_clobbers
+	  && targetm.return_call_with_max_clobbers (old_call_insn, call_insn)
+	     == call_insn))
+    lra_reg_info[regno].call_insn = call_insn;
+
   sparseset_clear_bit (pseudos_live_through_calls, regno);
   IOR_HARD_REG_SET (lra_reg_info[regno].conflict_hard_regs,
 		    last_call_used_reg_set);
 
   for (hr = 0; HARD_REGISTER_NUM_P (hr); hr++)
-    if (targetm.hard_regno_call_part_clobbered (hr,
+    if (targetm.hard_regno_call_part_clobbered (call_insn, hr,
 						PSEUDO_REGNO_MODE (regno)))
       add_to_hard_reg_set (&lra_reg_info[regno].conflict_hard_regs,
 			   PSEUDO_REGNO_MODE (regno), hr);
-  lra_reg_info[regno].call_p = true;
   if (! sparseset_bit_p (pseudos_live_through_setjumps, regno))
     return;
   sparseset_clear_bit (pseudos_live_through_setjumps, regno);
@@ -615,6 +625,19 @@ reg_early_clobber_p (const struct lra_insn_reg *reg, int n_alt)
 		  && TEST_BIT (reg->early_clobber_alts, n_alt))));
 }
 
+/* Return true if call instructions CALL1 and CALL2 use ABIs that
+   preserve the same set of registers.  */
+
+static bool
+calls_have_same_clobbers_p (rtx_insn *call1, rtx_insn *call2)
+{
+  if (!targetm.return_call_with_max_clobbers)
+    return false;
+
+  return (targetm.return_call_with_max_clobbers (call1, call2) == call1
+          && targetm.return_call_with_max_clobbers (call2, call1) == call2);
+}
+
 /* Process insns of the basic block BB to update pseudo live ranges,
    pseudo hard register conflicts, and insn notes.  We do it on
    backward scan of BB insns.  CURR_POINT is the program point where
@@ -635,6 +658,8 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
   rtx link, *link_loc;
   bool need_curr_point_incr;
   HARD_REG_SET last_call_used_reg_set;
+  rtx_insn *call_insn = NULL;
+  rtx_insn *last_call_insn = NULL;
 
   reg_live_out = df_get_live_out (bb);
   sparseset_clear (pseudos_live);
@@ -847,7 +872,8 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
 	      update_pseudo_point (reg->regno, curr_point, USE_POINT);
 	      mark_regno_live (reg->regno, reg->biggest_mode);
 	      check_pseudos_live_through_calls (reg->regno,
-						last_call_used_reg_set);
+						last_call_used_reg_set,
+						call_insn);
 	    }
 
 	  if (!HARD_REGISTER_NUM_P (reg->regno))
@@ -896,7 +922,8 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
 
       if (call_p)
 	{
-	  if (! flag_ipa_ra)
+	  call_insn = curr_insn;
+	  if (! flag_ipa_ra && ! targetm.return_call_with_max_clobbers)
 	    COPY_HARD_REG_SET(last_call_used_reg_set, call_used_reg_set);
 	  else
 	    {
@@ -906,17 +933,22 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
 
 	      bool flush = (! hard_reg_set_empty_p (last_call_used_reg_set)
 			    && ! hard_reg_set_equal_p (last_call_used_reg_set,
-						       this_call_used_reg_set));
+						       this_call_used_reg_set)
+			    && ! calls_have_same_clobbers_p (call_insn,
+							     last_call_insn));
 
 	      EXECUTE_IF_SET_IN_SPARSESET (pseudos_live, j)
 		{
 		  IOR_HARD_REG_SET (lra_reg_info[j].actual_call_used_reg_set,
 				    this_call_used_reg_set);
+
 		  if (flush)
-		    check_pseudos_live_through_calls
-		      (j, last_call_used_reg_set);
+		    check_pseudos_live_through_calls (j,
+						      last_call_used_reg_set,
+						      curr_insn);
 		}
 	      COPY_HARD_REG_SET(last_call_used_reg_set, this_call_used_reg_set);
+	      last_call_insn = call_insn;
 	    }
 
 	  sparseset_ior (pseudos_live_through_calls,
@@ -956,7 +988,8 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
 	      update_pseudo_point (reg->regno, curr_point, USE_POINT);
 	    mark_regno_live (reg->regno, reg->biggest_mode);
 	    check_pseudos_live_through_calls (reg->regno,
-					      last_call_used_reg_set);
+					      last_call_used_reg_set,
+					      call_insn);
 	  }
 
       for (reg = curr_static_id->hard_regs; reg != NULL; reg = reg->next)
@@ -1125,7 +1158,7 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
       if (sparseset_cardinality (pseudos_live_through_calls) == 0)
 	break;
       if (sparseset_bit_p (pseudos_live_through_calls, j))
-	check_pseudos_live_through_calls (j, last_call_used_reg_set);
+	check_pseudos_live_through_calls (j, last_call_used_reg_set, call_insn);
     }
 
   for (i = 0; HARD_REGISTER_NUM_P (i); ++i)
@@ -1359,7 +1392,6 @@ lra_create_live_ranges_1 (bool all_p, bool dead_insn_p)
 	lra_reg_info[i].biggest_mode = GET_MODE (regno_reg_rtx[i]);
       else
 	lra_reg_info[i].biggest_mode = VOIDmode;
-      lra_reg_info[i].call_p = false;
       if (!HARD_REGISTER_NUM_P (i)
 	  && lra_reg_info[i].nrefs != 0)
 	{
diff --git a/gcc/lra.c b/gcc/lra.c
index 592b990..e00e6e7 100644
--- a/gcc/lra.c
+++ b/gcc/lra.c
@@ -1344,6 +1344,7 @@ initialize_lra_reg_info_element (int i)
   lra_reg_info[i].val = get_new_reg_value ();
   lra_reg_info[i].offset = 0;
   lra_reg_info[i].copies = NULL;
+  lra_reg_info[i].call_insn = NULL;
 }
 
 /* Initialize common reg info and copies.  */
diff --git a/gcc/regcprop.c b/gcc/regcprop.c
index b107ea2..e6bdeb0 100644
--- a/gcc/regcprop.c
+++ b/gcc/regcprop.c
@@ -1054,7 +1054,7 @@ copyprop_hardreg_forward_1 (basic_block bb, struct value_data *vd)
 	  for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
 	    if ((TEST_HARD_REG_BIT (regs_invalidated_by_this_call, regno)
 		 || (targetm.hard_regno_call_part_clobbered
-		     (regno, vd->e[regno].mode)))
+		     (insn, regno, vd->e[regno].mode)))
 		&& (regno < set_regno || regno >= set_regno + set_nregs))
 	      kill_value_regno (regno, 1, vd);
 
diff --git a/gcc/reginfo.c b/gcc/reginfo.c
index 7a7fa4d..315c5ec 100644
--- a/gcc/reginfo.c
+++ b/gcc/reginfo.c
@@ -639,7 +639,7 @@ choose_hard_reg_mode (unsigned int regno ATTRIBUTE_UNUSED,
     if (hard_regno_nregs (regno, mode) == nregs
 	&& targetm.hard_regno_mode_ok (regno, mode)
 	&& (!call_saved
-	    || !targetm.hard_regno_call_part_clobbered (regno, mode))
+	    || !targetm.hard_regno_call_part_clobbered (NULL, regno, mode))
 	&& maybe_gt (GET_MODE_SIZE (mode), GET_MODE_SIZE (found_mode)))
       found_mode = mode;
 
@@ -647,7 +647,7 @@ choose_hard_reg_mode (unsigned int regno ATTRIBUTE_UNUSED,
     if (hard_regno_nregs (regno, mode) == nregs
 	&& targetm.hard_regno_mode_ok (regno, mode)
 	&& (!call_saved
-	    || !targetm.hard_regno_call_part_clobbered (regno, mode))
+	    || !targetm.hard_regno_call_part_clobbered (NULL, regno, mode))
 	&& maybe_gt (GET_MODE_SIZE (mode), GET_MODE_SIZE (found_mode)))
       found_mode = mode;
 
@@ -655,7 +655,7 @@ choose_hard_reg_mode (unsigned int regno ATTRIBUTE_UNUSED,
     if (hard_regno_nregs (regno, mode) == nregs
 	&& targetm.hard_regno_mode_ok (regno, mode)
 	&& (!call_saved
-	    || !targetm.hard_regno_call_part_clobbered (regno, mode))
+	    || !targetm.hard_regno_call_part_clobbered (NULL, regno, mode))
 	&& maybe_gt (GET_MODE_SIZE (mode), GET_MODE_SIZE (found_mode)))
       found_mode = mode;
 
@@ -663,7 +663,7 @@ choose_hard_reg_mode (unsigned int regno ATTRIBUTE_UNUSED,
     if (hard_regno_nregs (regno, mode) == nregs
 	&& targetm.hard_regno_mode_ok (regno, mode)
 	&& (!call_saved
-	    || !targetm.hard_regno_call_part_clobbered (regno, mode))
+	    || !targetm.hard_regno_call_part_clobbered (NULL, regno, mode))
 	&& maybe_gt (GET_MODE_SIZE (mode), GET_MODE_SIZE (found_mode)))
       found_mode = mode;
 
@@ -677,7 +677,7 @@ choose_hard_reg_mode (unsigned int regno ATTRIBUTE_UNUSED,
       if (hard_regno_nregs (regno, mode) == nregs
 	  && targetm.hard_regno_mode_ok (regno, mode)
 	  && (!call_saved
-	      || !targetm.hard_regno_call_part_clobbered (regno, mode)))
+	      || !targetm.hard_regno_call_part_clobbered (NULL, regno, mode)))
 	return mode;
     }
 
diff --git a/gcc/regrename.c b/gcc/regrename.c
index a180ced..637b3cb 100644
--- a/gcc/regrename.c
+++ b/gcc/regrename.c
@@ -339,9 +339,9 @@ check_new_reg_p (int reg ATTRIBUTE_UNUSED, int new_reg,
 	 && ! DEBUG_INSN_P (tmp->insn))
 	|| (this_head->need_caller_save_reg
 	    && ! (targetm.hard_regno_call_part_clobbered
-		  (reg, GET_MODE (*tmp->loc)))
+		  (NULL, reg, GET_MODE (*tmp->loc)))
 	    && (targetm.hard_regno_call_part_clobbered
-		(new_reg, GET_MODE (*tmp->loc)))))
+		(NULL, new_reg, GET_MODE (*tmp->loc)))))
       return false;
 
   return true;
diff --git a/gcc/reload.c b/gcc/reload.c
index 3ad11a8..72cc38a 100644
--- a/gcc/reload.c
+++ b/gcc/reload.c
@@ -6912,13 +6912,14 @@ find_equiv_reg (rtx goal, rtx_insn *insn, enum reg_class rclass, int other,
 	  if (regno >= 0 && regno < FIRST_PSEUDO_REGISTER)
 	    for (i = 0; i < nregs; ++i)
 	      if (call_used_regs[regno + i]
-		  || targetm.hard_regno_call_part_clobbered (regno + i, mode))
+		  || targetm.hard_regno_call_part_clobbered (NULL, regno + i,
+							     mode))
 		return 0;
 
 	  if (valueno >= 0 && valueno < FIRST_PSEUDO_REGISTER)
 	    for (i = 0; i < valuenregs; ++i)
 	      if (call_used_regs[valueno + i]
-		  || targetm.hard_regno_call_part_clobbered (valueno + i,
+		  || targetm.hard_regno_call_part_clobbered (NULL, valueno + i,
 							     mode))
 		return 0;
 	}
diff --git a/gcc/reload1.c b/gcc/reload1.c
index 42012e4..bb112d8 100644
--- a/gcc/reload1.c
+++ b/gcc/reload1.c
@@ -8289,7 +8289,8 @@ emit_reload_insns (struct insn_chain *chain)
 			   : out_regno + k);
 		      reg_reloaded_insn[regno + k] = insn;
 		      SET_HARD_REG_BIT (reg_reloaded_valid, regno + k);
-		      if (targetm.hard_regno_call_part_clobbered (regno + k,
+		      if (targetm.hard_regno_call_part_clobbered (NULL,
+								  regno + k,
 								  mode))
 			SET_HARD_REG_BIT (reg_reloaded_call_part_clobbered,
 					  regno + k);
@@ -8369,7 +8370,8 @@ emit_reload_insns (struct insn_chain *chain)
 			   : in_regno + k);
 		      reg_reloaded_insn[regno + k] = insn;
 		      SET_HARD_REG_BIT (reg_reloaded_valid, regno + k);
-		      if (targetm.hard_regno_call_part_clobbered (regno + k,
+		      if (targetm.hard_regno_call_part_clobbered (NULL,
+								  regno + k,
 								  mode))
 			SET_HARD_REG_BIT (reg_reloaded_call_part_clobbered,
 					  regno + k);
@@ -8485,7 +8487,7 @@ emit_reload_insns (struct insn_chain *chain)
 		      CLEAR_HARD_REG_BIT (reg_reloaded_dead, src_regno + k);
 		      SET_HARD_REG_BIT (reg_reloaded_valid, src_regno + k);
 		      if (targetm.hard_regno_call_part_clobbered
-			  (src_regno + k, mode))
+			  (NULL, src_regno + k, mode))
 			SET_HARD_REG_BIT (reg_reloaded_call_part_clobbered,
 					  src_regno + k);
 		      else
diff --git a/gcc/sched-deps.c b/gcc/sched-deps.c
index a9e934d..6cf4caf 100644
--- a/gcc/sched-deps.c
+++ b/gcc/sched-deps.c
@@ -3728,7 +3728,7 @@ deps_analyze_insn (struct deps_desc *deps, rtx_insn *insn)
              Since we only have a choice between 'might be clobbered'
              and 'definitely not clobbered', we must include all
              partly call-clobbered registers here.  */
-	    else if (targetm.hard_regno_call_part_clobbered (i,
+	    else if (targetm.hard_regno_call_part_clobbered (insn, i,
 							     reg_raw_mode[i])
                      || TEST_HARD_REG_BIT (regs_invalidated_by_call, i))
               SET_REGNO_REG_SET (reg_pending_clobbers, i);
diff --git a/gcc/sel-sched.c b/gcc/sel-sched.c
index bf4b2dd..315f2c0 100644
--- a/gcc/sel-sched.c
+++ b/gcc/sel-sched.c
@@ -1102,7 +1102,7 @@ init_regs_for_mode (machine_mode mode)
       if (i >= 0)
         continue;
 
-      if (targetm.hard_regno_call_part_clobbered (cur_reg, mode))
+      if (targetm.hard_regno_call_part_clobbered (NULL, cur_reg, mode))
         SET_HARD_REG_BIT (sel_hrd.regs_for_call_clobbered[mode],
                           cur_reg);
 
@@ -1251,7 +1251,7 @@ mark_unavailable_hard_regs (def_t def, struct reg_rename *reg_rename_p,
 
   /* Exclude registers that are partially call clobbered.  */
   if (def->crosses_call
-      && !targetm.hard_regno_call_part_clobbered (regno, mode))
+      && !targetm.hard_regno_call_part_clobbered (NULL, regno, mode))
     AND_COMPL_HARD_REG_SET (reg_rename_p->available_for_renaming,
                             sel_hrd.regs_for_call_clobbered[mode]);
 
diff --git a/gcc/target.def b/gcc/target.def
index 2aeb1ff..7ebc90b 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -5766,14 +5766,30 @@ DEFHOOK
 (hard_regno_call_part_clobbered,
  "This hook should return true if @var{regno} is partly call-saved and\n\
 partly call-clobbered, and if a value of mode @var{mode} would be partly\n\
-clobbered by a call.  For example, if the low 32 bits of @var{regno} are\n\
-preserved across a call but higher bits are clobbered, this hook should\n\
-return true for a 64-bit mode but false for a 32-bit mode.\n\
+clobbered by call instruction @var{insn}.  If @var{insn} is NULL then it\n\
+should return true if any call could partly clobber the register.\n\
+For example, if the low 32 bits of @var{regno} are preserved across a call\n\
+but higher bits are clobbered, this hook should return true for a 64-bit\n\
+mode but false for a 32-bit mode.\n\
 \n\
 The default implementation returns false, which is correct\n\
 for targets that don't have partly call-clobbered registers.",
- bool, (unsigned int regno, machine_mode mode),
- hook_bool_uint_mode_false)
+ bool, (rtx_insn *insn, unsigned int regno, machine_mode mode),
+ hook_bool_insn_uint_mode_false)
+
+DEFHOOK
+(return_call_with_max_clobbers,
+ "This hook returns a pointer to the call that partially clobbers the\n\
+most registers.  If a platform supports multiple ABIs where the registers\n\
+that are partially clobbered may vary, this function compares two\n\
+calls and returns a pointer to the one that clobbers the most registers.\n\
+If both calls clobber the same registers, @var{call_1} must be returned.\n\
+\n\
+The registers clobbered in different ABIs must be a proper subset or\n\
+superset of all other ABIs.  @var{call_1} must always be a call insn,\n\
+call_2 may be NULL or a call insn.",
+ rtx_insn *, (rtx_insn *call_1, rtx_insn *call_2),
+ NULL)
 
 /* Return the smallest number of different values for which it is best to
    use a jump-table instead of a tree of conditional branches.  */
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 898848f..2cbdc4a 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -1930,7 +1930,7 @@ default_dwarf_frame_reg_mode (int regno)
 {
   machine_mode save_mode = reg_raw_mode[regno];
 
-  if (targetm.hard_regno_call_part_clobbered (regno, save_mode))
+  if (targetm.hard_regno_call_part_clobbered (NULL, regno, save_mode))
     save_mode = choose_hard_reg_mode (regno, 1, true);
   return save_mode;
 }
Richard Sandiford Jan. 10, 2019, 9:16 a.m. | #7
Steve Ellcey <sellcey@marvell.com> writes:
> On Wed, 2019-01-09 at 10:00 +0000, Richard Sandiford wrote:
>
> Thanks for the quick turnaround on the comments Richard.  Here is a new
> version where I tried to address all the issues you raised.  One thing
> I noticed is that I think your calls_have_same_clobbers_p function only
> works if, when return_call_with_max_clobbers is called with two calls
> that clobber the same set of registers, it always returns the first
> call.
>
> I don't think my original function had that guarantee but I changed it 
> so that it would and documented that requirement in target.def.  I
> couldn't see a better way to implement the calls_have_same_clobbers_p
> function other than doing that.

Yeah, I think that's a good guarantee to have.

> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 1c300af..d88be6c 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -1655,14 +1655,56 @@ aarch64_reg_save_mode (tree fndecl, unsigned regno)
>  /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  The callee only saves
>     the lower 64 bits of a 128-bit register.  Tell the compiler the callee
>     clobbers the top 64 bits when restoring the bottom 64 bits.  */
>  
>  static bool
> -aarch64_hard_regno_call_part_clobbered (unsigned int regno, machine_mode mode)
> +aarch64_hard_regno_call_part_clobbered (rtx_insn *insn, unsigned int regno,
> +					machine_mode mode)
>  {
> -  return FP_REGNUM_P (regno) && maybe_gt (GET_MODE_SIZE (mode), 8);
> +  bool simd_p = insn && CALL_P (insn) && aarch64_simd_call_p (insn);
> +  return FP_REGNUM_P (regno)
> +	 && maybe_gt (GET_MODE_SIZE (mode), simd_p ? 16 : 8);
> +}
> +
> +/* Implement TARGET_RETURN_CALL_WITH_MAX_CLOBBERS.  */
> +
> +rtx_insn *
> +aarch64_return_call_with_max_clobbers (rtx_insn *call_1, rtx_insn *call_2)
> +{
> +  gcc_assert (CALL_P (call_1) && CALL_P (call_2));
> +
> +  if (aarch64_simd_call_p (call_1) == aarch64_simd_call_p (call_2))
> +    return call_1;
> +
> +  if (aarch64_simd_call_p (call_2))
> +    return call_1;
> +  else
> +    return call_2;

Think this is simpler as:

  gcc_assert (CALL_P (call_1) && CALL_P (call_2));

  if (!aarch64_simd_call_p (call_1) || aarch64_simd_call_p (call_2))
    return call_1;
  else
    return call_2;

> diff --git a/gcc/lra-lives.c b/gcc/lra-lives.c
> index a00ec38..61149e1 100644
> --- a/gcc/lra-lives.c
> +++ b/gcc/lra-lives.c
> @@ -579,22 +579,32 @@ lra_setup_reload_pseudo_preferenced_hard_reg (int regno,
>     PSEUDOS_LIVE_THROUGH_CALLS and PSEUDOS_LIVE_THROUGH_SETJUMPS.  */
>  static inline void
>  check_pseudos_live_through_calls (int regno,
> -				  HARD_REG_SET last_call_used_reg_set)
> +				  HARD_REG_SET last_call_used_reg_set,
> +				  rtx_insn *call_insn)

Should document the new parameter.

> @@ -906,17 +933,22 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
>  
>  	      bool flush = (! hard_reg_set_empty_p (last_call_used_reg_set)
>  			    && ! hard_reg_set_equal_p (last_call_used_reg_set,
> -						       this_call_used_reg_set));
> +						       this_call_used_reg_set)
> +			    && ! calls_have_same_clobbers_p (call_insn,
> +							     last_call_insn));

This should be || with the current test, not &&.  We need to check
that last_call_insn is nonnull first.

>  	      EXECUTE_IF_SET_IN_SPARSESET (pseudos_live, j)
>  		{
>  		  IOR_HARD_REG_SET (lra_reg_info[j].actual_call_used_reg_set,
>  				    this_call_used_reg_set);
> +
>  		  if (flush)
> -		    check_pseudos_live_through_calls
> -		      (j, last_call_used_reg_set);
> +		    check_pseudos_live_through_calls (j,
> +						      last_call_used_reg_set,
> +						      curr_insn);
>  		}

Should be last_call_insn rather than curr_insn.  I.e. when we flush,
we apply the properties of the previous call to pseudos live after
the new call.

Looks good otherwise.

Thanks,
Richard
Steve Ellcey Jan. 11, 2019, 12:13 a.m. | #8
OK, I fixed the issues in your last email.  I initially found one
regression while testing.  In lra_create_live_ranges_1 I had removed
the 'call_p = false' statement but did not replaced it with anything.
This resulted in no regressions on aarch64 but caused a single
regression on x86 (gcc.target/i386/pr87759.c).  I replaced the
line with 'call_insn = NULL' and the regression went away so I
have clean bootstraps and no regressions on aarch64 and x86 now.

If this looks good to you can I go ahead and check it in?  I know
we are in Stage 3 now, but my recollection is that patches that were
initially submitted during Stage 1 could go ahead once approved.

Steve Ellcey
sellcey@marvell.com



2019-01-10  Steve Ellcey  <sellcey@marvell.com>

	* config/aarch64/aarch64.c (aarch64_simd_call_p): New function.
	(aarch64_hard_regno_call_part_clobbered): Add insn argument.
	(aarch64_return_call_with_max_clobbers): New function.
	(TARGET_RETURN_CALL_WITH_MAX_CLOBBERS): New macro.
	* config/avr/avr.c (avr_hard_regno_call_part_clobbered): Add insn
	argument.
	* config/i386/i386.c (ix86_hard_regno_call_part_clobbered): Ditto.
	* config/mips/mips.c (mips_hard_regno_call_part_clobbered): Ditto.
	* config/rs6000/rs6000.c (rs6000_hard_regno_call_part_clobbered): Ditto.
	* config/s390/s390.c (s390_hard_regno_call_part_clobbered): Ditto.
	* cselib.c (cselib_process_insn): Add argument to
	targetm.hard_regno_call_part_clobbered call.
	* ira-conflicts.c (ira_build_conflicts): Ditto.
	* ira-costs.c (ira_tune_allocno_costs): Ditto.
	* lra-constraints.c (inherit_reload_reg): Ditto.
	* lra-int.h (struct lra_reg): Add call_insn field, remove call_p field.
	* lra-lives.c (check_pseudos_live_through_calls): Add call_insn
	argument.  Call targetm.return_call_with_max_clobbers.
	Add argument to targetm.hard_regno_call_part_clobbered call.
	(calls_have_same_clobbers_p): New function.
	(process_bb_lives): Add call_insn and last_call_insn variables.
	Pass call_insn to check_pseudos_live_through_calls.
	Modify if stmt to check targetm.return_call_with_max_clobbers.
	Update setting of flush variable.
	(lra_create_live_ranges_1): Set call_insn to NULL instead of call_p
	to false.
	* lra.c (initialize_lra_reg_info_element): Set call_insn to NULL.
	* regcprop.c (copyprop_hardreg_forward_1): Add argument to
        targetm.hard_regno_call_part_clobbered call.
	* reginfo.c (choose_hard_reg_mode): Ditto.
	* regrename.c (check_new_reg_p): Ditto.
	* reload.c (find_equiv_reg): Ditto.
	* reload1.c (emit_reload_insns): Ditto.
	* sched-deps.c (deps_analyze_insn): Ditto.
	* sel-sched.c (init_regs_for_mode): Ditto.
	(mark_unavailable_hard_regs): Ditto.
	* targhooks.c (default_dwarf_frame_reg_mode): Ditto.
	* target.def (hard_regno_call_part_clobbered): Add insn argument.
	(return_call_with_max_clobbers): New target function.
	* doc/tm.texi: Regenerate.
	* doc/tm.texi.in (TARGET_RETURN_CALL_WITH_MAX_CLOBBERS): New hook.
	* hooks.c (hook_bool_uint_mode_false): Change to
	hook_bool_insn_uint_mode_false.
	* hooks.h (hook_bool_uint_mode_false): Ditto.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 1c300af..7a1f838 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1655,14 +1655,53 @@ aarch64_reg_save_mode (tree fndecl, unsigned regno)
 	   : (aarch64_simd_decl_p (fndecl) ? E_TFmode : E_DFmode);
 }
 
+/* Return true if the instruction is a call to a SIMD function, false
+   if it is not a SIMD function or if we do not know anything about
+   the function.  */
+
+static bool
+aarch64_simd_call_p (rtx_insn *insn)
+{
+  rtx symbol;
+  rtx call;
+  tree fndecl;
+
+  gcc_assert (CALL_P (insn));
+  call = get_call_rtx_from (insn);
+  symbol = XEXP (XEXP (call, 0), 0);
+  if (GET_CODE (symbol) != SYMBOL_REF)
+    return false;
+  fndecl = SYMBOL_REF_DECL (symbol);
+  if (!fndecl)
+    return false;
+
+  return aarch64_simd_decl_p (fndecl);
+}
+
 /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  The callee only saves
    the lower 64 bits of a 128-bit register.  Tell the compiler the callee
    clobbers the top 64 bits when restoring the bottom 64 bits.  */
 
 static bool
-aarch64_hard_regno_call_part_clobbered (unsigned int regno, machine_mode mode)
+aarch64_hard_regno_call_part_clobbered (rtx_insn *insn, unsigned int regno,
+					machine_mode mode)
+{
+  bool simd_p = insn && CALL_P (insn) && aarch64_simd_call_p (insn);
+  return FP_REGNUM_P (regno)
+	 && maybe_gt (GET_MODE_SIZE (mode), simd_p ? 16 : 8);
+}
+
+/* Implement TARGET_RETURN_CALL_WITH_MAX_CLOBBERS.  */
+
+rtx_insn *
+aarch64_return_call_with_max_clobbers (rtx_insn *call_1, rtx_insn *call_2)
 {
-  return FP_REGNUM_P (regno) && maybe_gt (GET_MODE_SIZE (mode), 8);
+  gcc_assert (CALL_P (call_1) && CALL_P (call_2));
+
+  if (!aarch64_simd_call_p (call_1) || aarch64_simd_call_p (call_2))
+    return call_1;
+  else
+    return call_2;
 }
 
 /* Implement REGMODE_NATURAL_SIZE.  */
@@ -18825,6 +18864,10 @@ aarch64_libgcc_floating_mode_supported_p
 #define TARGET_HARD_REGNO_CALL_PART_CLOBBERED \
   aarch64_hard_regno_call_part_clobbered
 
+#undef TARGET_RETURN_CALL_WITH_MAX_CLOBBERS
+#define TARGET_RETURN_CALL_WITH_MAX_CLOBBERS \
+  aarch64_return_call_with_max_clobbers
+
 #undef TARGET_CONSTANT_ALIGNMENT
 #define TARGET_CONSTANT_ALIGNMENT aarch64_constant_alignment
 
diff --git a/gcc/config/avr/avr.c b/gcc/config/avr/avr.c
index 023308b..a53b909 100644
--- a/gcc/config/avr/avr.c
+++ b/gcc/config/avr/avr.c
@@ -12181,7 +12181,8 @@ avr_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
 /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  */
 
 static bool
-avr_hard_regno_call_part_clobbered (unsigned regno, machine_mode mode)
+avr_hard_regno_call_part_clobbered (rtx_insn *insn ATTRIBUTE_UNUSED,
+				    unsigned regno, machine_mode mode)
 {
   /* FIXME: This hook gets called with MODE:REGNO combinations that don't
         represent valid hard registers like, e.g. HI:29.  Returning TRUE
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 0e23eaa..1bb535a 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -40216,7 +40216,8 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
    the low 16 bytes are saved.  */
 
 static bool
-ix86_hard_regno_call_part_clobbered (unsigned int regno, machine_mode mode)
+ix86_hard_regno_call_part_clobbered (rtx_insn *insn ATTRIBUTE_UNUSED,
+				     unsigned int regno, machine_mode mode)
 {
   return SSE_REGNO_P (regno) && GET_MODE_SIZE (mode) > 16;
 }
diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 95dc946..a8022b8 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -12906,7 +12906,8 @@ mips_hard_regno_scratch_ok (unsigned int regno)
    registers with MODE > 64 bits are part clobbered too.  */
 
 static bool
-mips_hard_regno_call_part_clobbered (unsigned int regno, machine_mode mode)
+mips_hard_regno_call_part_clobbered (rtx_insn *insn ATTRIBUTE_UNUSED,
+				     unsigned int regno, machine_mode mode)
 {
   if (TARGET_FLOATXX
       && hard_regno_nregs (regno, mode) == 1
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 0357dc8..3330b68 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -2197,7 +2197,8 @@ rs6000_modes_tieable_p (machine_mode mode1, machine_mode mode2)
 /* Implement TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  */
 
 static bool
-rs6000_hard_regno_call_part_clobbered (unsigned int regno, machine_mode mode)
+rs6000_hard_regno_call_part_clobbered (rtx_insn *insn ATTRIBUTE_UNUSED,
+				       unsigned int regno, machine_mode mode)
 {
   if (TARGET_32BIT
       && TARGET_POWERPC64
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index ea2be10..6a571a3 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -10098,7 +10098,8 @@ s390_hard_regno_scratch_ok (unsigned int regno)
    bytes are saved across calls, however.  */
 
 static bool
-s390_hard_regno_call_part_clobbered (unsigned int regno, machine_mode mode)
+s390_hard_regno_call_part_clobbered (rtx_insn *insn ATTRIBUTE_UNUSED,
+				     unsigned int regno, machine_mode mode)
 {
   if (!TARGET_64BIT
       && TARGET_ZARCH
diff --git a/gcc/cselib.c b/gcc/cselib.c
index cef4bc0..84c17c2 100644
--- a/gcc/cselib.c
+++ b/gcc/cselib.c
@@ -2770,7 +2770,7 @@ cselib_process_insn (rtx_insn *insn)
 	if (call_used_regs[i]
 	    || (REG_VALUES (i) && REG_VALUES (i)->elt
 		&& (targetm.hard_regno_call_part_clobbered
-		    (i, GET_MODE (REG_VALUES (i)->elt->val_rtx)))))
+		    (insn, i, GET_MODE (REG_VALUES (i)->elt->val_rtx)))))
 	  cselib_invalidate_regno (i, reg_raw_mode[i]);
 
       /* Since it is not clear how cselib is going to be used, be
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 976a700..97a2ade 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -1707,6 +1707,8 @@ of @code{CALL_USED_REGISTERS}.
 @cindex call-saved register
 @hook TARGET_HARD_REGNO_CALL_PART_CLOBBERED
 
+@hook TARGET_RETURN_CALL_WITH_MAX_CLOBBERS
+
 @findex fixed_regs
 @findex call_used_regs
 @findex global_regs
diff --git a/gcc/hooks.c b/gcc/hooks.c
index bbc35fc..f95659b 100644
--- a/gcc/hooks.c
+++ b/gcc/hooks.c
@@ -142,7 +142,7 @@ hook_bool_puint64_puint64_true (poly_uint64, poly_uint64)
 
 /* Generic hook that takes (unsigned int, machine_mode) and returns false.  */
 bool
-hook_bool_uint_mode_false (unsigned int, machine_mode)
+hook_bool_insn_uint_mode_false (rtx_insn *, unsigned int, machine_mode)
 {
   return false;
 }
diff --git a/gcc/hooks.h b/gcc/hooks.h
index 9e4bc29..0bc8117 100644
--- a/gcc/hooks.h
+++ b/gcc/hooks.h
@@ -40,7 +40,8 @@ extern bool hook_bool_const_rtx_insn_const_rtx_insn_true (const rtx_insn *,
 extern bool hook_bool_mode_uhwi_false (machine_mode,
 				       unsigned HOST_WIDE_INT);
 extern bool hook_bool_puint64_puint64_true (poly_uint64, poly_uint64);
-extern bool hook_bool_uint_mode_false (unsigned int, machine_mode);
+extern bool hook_bool_insn_uint_mode_false (rtx_insn *, unsigned int,
+					    machine_mode);
 extern bool hook_bool_uint_mode_true (unsigned int, machine_mode);
 extern bool hook_bool_tree_false (tree);
 extern bool hook_bool_const_tree_false (const_tree);
diff --git a/gcc/ira-conflicts.c b/gcc/ira-conflicts.c
index 7ec709d..5bd6c0c 100644
--- a/gcc/ira-conflicts.c
+++ b/gcc/ira-conflicts.c
@@ -808,7 +808,7 @@ ira_build_conflicts (void)
 		 regs must conflict with them.  */
 	      for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
 		if (!TEST_HARD_REG_BIT (call_used_reg_set, regno)
-		    && targetm.hard_regno_call_part_clobbered (regno,
+		    && targetm.hard_regno_call_part_clobbered (NULL, regno,
 							       obj_mode))
 		  {
 		    SET_HARD_REG_BIT (OBJECT_CONFLICT_HARD_REGS (obj), regno);
diff --git a/gcc/ira-costs.c b/gcc/ira-costs.c
index 0ca70a0..a17dae3 100644
--- a/gcc/ira-costs.c
+++ b/gcc/ira-costs.c
@@ -2379,7 +2379,7 @@ ira_tune_allocno_costs (void)
 						   *crossed_calls_clobber_regs)
 		  && (ira_hard_reg_set_intersection_p (regno, mode,
 						       call_used_reg_set)
-		      || targetm.hard_regno_call_part_clobbered (regno,
+		      || targetm.hard_regno_call_part_clobbered (NULL, regno,
 								 mode)))
 		cost += (ALLOCNO_CALL_FREQ (a)
 			 * (ira_memory_move_cost[mode][rclass][0]
diff --git a/gcc/lra-assigns.c b/gcc/lra-assigns.c
index 1312804..cb4a873 100644
--- a/gcc/lra-assigns.c
+++ b/gcc/lra-assigns.c
@@ -1621,7 +1621,7 @@ lra_assign (bool &fails_p)
        asm is removed and it can result in incorrect allocation.  */
     for (i = FIRST_PSEUDO_REGISTER; i < max_regno; i++)
       if (lra_reg_info[i].nrefs != 0 && reg_renumber[i] >= 0
-	  && lra_reg_info[i].call_p
+	  && lra_reg_info[i].call_insn
 	  && overlaps_hard_reg_set_p (call_used_reg_set,
 				      PSEUDO_REGNO_MODE (i), reg_renumber[i]))
 	gcc_unreachable ();
diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index 4f434e5..1440451 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -5377,7 +5377,8 @@ need_for_call_save_p (int regno)
 	       : call_used_reg_set,
 	       PSEUDO_REGNO_MODE (regno), reg_renumber[regno])
 	      || (targetm.hard_regno_call_part_clobbered
-		  (reg_renumber[regno], PSEUDO_REGNO_MODE (regno)))));
+		  (lra_reg_info[regno].call_insn,
+		   reg_renumber[regno], PSEUDO_REGNO_MODE (regno)))));
 }
 
 /* Global registers occurring in the current EBB.  */
diff --git a/gcc/lra-int.h b/gcc/lra-int.h
index 9d9e81d..d0a8fac 100644
--- a/gcc/lra-int.h
+++ b/gcc/lra-int.h
@@ -91,10 +91,6 @@ struct lra_reg
   /* True if the pseudo should not be assigned to a stack register.  */
   bool no_stack_p;
 #endif
-  /* True if the pseudo crosses a call.	 It is setup in lra-lives.c
-     and used to check that the pseudo crossing a call did not get a
-     call used hard register.  */
-  bool call_p;
   /* Number of references and execution frequencies of the register in
      *non-debug* insns.	 */
   int nrefs, freq;
@@ -107,6 +103,8 @@ struct lra_reg
   int val;
   /* Offset from relative eliminate register to pesudo reg.  */
   poly_int64 offset;
+  /* Call instruction, if any, that may affect this psuedo reg.  */
+  rtx_insn *call_insn;
   /* These members are set up in lra-lives.c and updated in
      lra-coalesce.c.  */
   /* The biggest size mode in which each pseudo reg is referred in
diff --git a/gcc/lra-lives.c b/gcc/lra-lives.c
index a00ec38..b77b675 100644
--- a/gcc/lra-lives.c
+++ b/gcc/lra-lives.c
@@ -576,25 +576,39 @@ lra_setup_reload_pseudo_preferenced_hard_reg (int regno,
 
 /* Check that REGNO living through calls and setjumps, set up conflict
    regs using LAST_CALL_USED_REG_SET, and clear corresponding bits in
-   PSEUDOS_LIVE_THROUGH_CALLS and PSEUDOS_LIVE_THROUGH_SETJUMPS.  */
+   PSEUDOS_LIVE_THROUGH_CALLS and PSEUDOS_LIVE_THROUGH_SETJUMPS.
+   CALL_INSN may be the specific call we want to check that REGNO lives
+   through or a call that is guaranteed to clobber REGNO if any call
+   in the current block clobbers REGNO.  */
+
 static inline void
 check_pseudos_live_through_calls (int regno,
-				  HARD_REG_SET last_call_used_reg_set)
+				  HARD_REG_SET last_call_used_reg_set,
+				  rtx_insn *call_insn)
 {
   int hr;
+  rtx_insn *old_call_insn;
 
   if (! sparseset_bit_p (pseudos_live_through_calls, regno))
     return;
+
+  gcc_assert (call_insn && CALL_P (call_insn));
+  old_call_insn = lra_reg_info[regno].call_insn;
+  if (!old_call_insn
+      || (targetm.return_call_with_max_clobbers
+	  && targetm.return_call_with_max_clobbers (old_call_insn, call_insn)
+	     == call_insn))
+    lra_reg_info[regno].call_insn = call_insn;
+
   sparseset_clear_bit (pseudos_live_through_calls, regno);
   IOR_HARD_REG_SET (lra_reg_info[regno].conflict_hard_regs,
 		    last_call_used_reg_set);
 
   for (hr = 0; HARD_REGISTER_NUM_P (hr); hr++)
-    if (targetm.hard_regno_call_part_clobbered (hr,
+    if (targetm.hard_regno_call_part_clobbered (call_insn, hr,
 						PSEUDO_REGNO_MODE (regno)))
       add_to_hard_reg_set (&lra_reg_info[regno].conflict_hard_regs,
 			   PSEUDO_REGNO_MODE (regno), hr);
-  lra_reg_info[regno].call_p = true;
   if (! sparseset_bit_p (pseudos_live_through_setjumps, regno))
     return;
   sparseset_clear_bit (pseudos_live_through_setjumps, regno);
@@ -615,6 +629,19 @@ reg_early_clobber_p (const struct lra_insn_reg *reg, int n_alt)
 		  && TEST_BIT (reg->early_clobber_alts, n_alt))));
 }
 
+/* Return true if call instructions CALL1 and CALL2 use ABIs that
+   preserve the same set of registers.  */
+
+static bool
+calls_have_same_clobbers_p (rtx_insn *call1, rtx_insn *call2)
+{
+  if (!targetm.return_call_with_max_clobbers)
+    return false;
+
+  return (targetm.return_call_with_max_clobbers (call1, call2) == call1
+          && targetm.return_call_with_max_clobbers (call2, call1) == call2);
+}
+
 /* Process insns of the basic block BB to update pseudo live ranges,
    pseudo hard register conflicts, and insn notes.  We do it on
    backward scan of BB insns.  CURR_POINT is the program point where
@@ -635,6 +662,8 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
   rtx link, *link_loc;
   bool need_curr_point_incr;
   HARD_REG_SET last_call_used_reg_set;
+  rtx_insn *call_insn = NULL;
+  rtx_insn *last_call_insn = NULL;
 
   reg_live_out = df_get_live_out (bb);
   sparseset_clear (pseudos_live);
@@ -847,7 +876,8 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
 	      update_pseudo_point (reg->regno, curr_point, USE_POINT);
 	      mark_regno_live (reg->regno, reg->biggest_mode);
 	      check_pseudos_live_through_calls (reg->regno,
-						last_call_used_reg_set);
+						last_call_used_reg_set,
+						call_insn);
 	    }
 
 	  if (!HARD_REGISTER_NUM_P (reg->regno))
@@ -896,7 +926,8 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
 
       if (call_p)
 	{
-	  if (! flag_ipa_ra)
+	  call_insn = curr_insn;
+	  if (! flag_ipa_ra && ! targetm.return_call_with_max_clobbers)
 	    COPY_HARD_REG_SET(last_call_used_reg_set, call_used_reg_set);
 	  else
 	    {
@@ -905,18 +936,24 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
 				      call_used_reg_set);
 
 	      bool flush = (! hard_reg_set_empty_p (last_call_used_reg_set)
-			    && ! hard_reg_set_equal_p (last_call_used_reg_set,
-						       this_call_used_reg_set));
+			    && ( ! hard_reg_set_equal_p (last_call_used_reg_set,
+						       this_call_used_reg_set)))
+			   || (last_call_insn && ! calls_have_same_clobbers_p
+						     (call_insn,
+						      last_call_insn));
 
 	      EXECUTE_IF_SET_IN_SPARSESET (pseudos_live, j)
 		{
 		  IOR_HARD_REG_SET (lra_reg_info[j].actual_call_used_reg_set,
 				    this_call_used_reg_set);
+
 		  if (flush)
-		    check_pseudos_live_through_calls
-		      (j, last_call_used_reg_set);
+		    check_pseudos_live_through_calls (j,
+						      last_call_used_reg_set,
+						      last_call_insn);
 		}
 	      COPY_HARD_REG_SET(last_call_used_reg_set, this_call_used_reg_set);
+	      last_call_insn = call_insn;
 	    }
 
 	  sparseset_ior (pseudos_live_through_calls,
@@ -956,7 +993,8 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
 	      update_pseudo_point (reg->regno, curr_point, USE_POINT);
 	    mark_regno_live (reg->regno, reg->biggest_mode);
 	    check_pseudos_live_through_calls (reg->regno,
-					      last_call_used_reg_set);
+					      last_call_used_reg_set,
+					      call_insn);
 	  }
 
       for (reg = curr_static_id->hard_regs; reg != NULL; reg = reg->next)
@@ -1125,7 +1163,7 @@ process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
       if (sparseset_cardinality (pseudos_live_through_calls) == 0)
 	break;
       if (sparseset_bit_p (pseudos_live_through_calls, j))
-	check_pseudos_live_through_calls (j, last_call_used_reg_set);
+	check_pseudos_live_through_calls (j, last_call_used_reg_set, call_insn);
     }
 
   for (i = 0; HARD_REGISTER_NUM_P (i); ++i)
@@ -1359,7 +1397,7 @@ lra_create_live_ranges_1 (bool all_p, bool dead_insn_p)
 	lra_reg_info[i].biggest_mode = GET_MODE (regno_reg_rtx[i]);
       else
 	lra_reg_info[i].biggest_mode = VOIDmode;
-      lra_reg_info[i].call_p = false;
+      lra_reg_info[i].call_insn = NULL;
       if (!HARD_REGISTER_NUM_P (i)
 	  && lra_reg_info[i].nrefs != 0)
 	{
diff --git a/gcc/lra.c b/gcc/lra.c
index 592b990..e00e6e7 100644
--- a/gcc/lra.c
+++ b/gcc/lra.c
@@ -1344,6 +1344,7 @@ initialize_lra_reg_info_element (int i)
   lra_reg_info[i].val = get_new_reg_value ();
   lra_reg_info[i].offset = 0;
   lra_reg_info[i].copies = NULL;
+  lra_reg_info[i].call_insn = NULL;
 }
 
 /* Initialize common reg info and copies.  */
diff --git a/gcc/regcprop.c b/gcc/regcprop.c
index b107ea2..e6bdeb0 100644
--- a/gcc/regcprop.c
+++ b/gcc/regcprop.c
@@ -1054,7 +1054,7 @@ copyprop_hardreg_forward_1 (basic_block bb, struct value_data *vd)
 	  for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
 	    if ((TEST_HARD_REG_BIT (regs_invalidated_by_this_call, regno)
 		 || (targetm.hard_regno_call_part_clobbered
-		     (regno, vd->e[regno].mode)))
+		     (insn, regno, vd->e[regno].mode)))
 		&& (regno < set_regno || regno >= set_regno + set_nregs))
 	      kill_value_regno (regno, 1, vd);
 
diff --git a/gcc/reginfo.c b/gcc/reginfo.c
index 7a7fa4d..315c5ec 100644
--- a/gcc/reginfo.c
+++ b/gcc/reginfo.c
@@ -639,7 +639,7 @@ choose_hard_reg_mode (unsigned int regno ATTRIBUTE_UNUSED,
     if (hard_regno_nregs (regno, mode) == nregs
 	&& targetm.hard_regno_mode_ok (regno, mode)
 	&& (!call_saved
-	    || !targetm.hard_regno_call_part_clobbered (regno, mode))
+	    || !targetm.hard_regno_call_part_clobbered (NULL, regno, mode))
 	&& maybe_gt (GET_MODE_SIZE (mode), GET_MODE_SIZE (found_mode)))
       found_mode = mode;
 
@@ -647,7 +647,7 @@ choose_hard_reg_mode (unsigned int regno ATTRIBUTE_UNUSED,
     if (hard_regno_nregs (regno, mode) == nregs
 	&& targetm.hard_regno_mode_ok (regno, mode)
 	&& (!call_saved
-	    || !targetm.hard_regno_call_part_clobbered (regno, mode))
+	    || !targetm.hard_regno_call_part_clobbered (NULL, regno, mode))
 	&& maybe_gt (GET_MODE_SIZE (mode), GET_MODE_SIZE (found_mode)))
       found_mode = mode;
 
@@ -655,7 +655,7 @@ choose_hard_reg_mode (unsigned int regno ATTRIBUTE_UNUSED,
     if (hard_regno_nregs (regno, mode) == nregs
 	&& targetm.hard_regno_mode_ok (regno, mode)
 	&& (!call_saved
-	    || !targetm.hard_regno_call_part_clobbered (regno, mode))
+	    || !targetm.hard_regno_call_part_clobbered (NULL, regno, mode))
 	&& maybe_gt (GET_MODE_SIZE (mode), GET_MODE_SIZE (found_mode)))
       found_mode = mode;
 
@@ -663,7 +663,7 @@ choose_hard_reg_mode (unsigned int regno ATTRIBUTE_UNUSED,
     if (hard_regno_nregs (regno, mode) == nregs
 	&& targetm.hard_regno_mode_ok (regno, mode)
 	&& (!call_saved
-	    || !targetm.hard_regno_call_part_clobbered (regno, mode))
+	    || !targetm.hard_regno_call_part_clobbered (NULL, regno, mode))
 	&& maybe_gt (GET_MODE_SIZE (mode), GET_MODE_SIZE (found_mode)))
       found_mode = mode;
 
@@ -677,7 +677,7 @@ choose_hard_reg_mode (unsigned int regno ATTRIBUTE_UNUSED,
       if (hard_regno_nregs (regno, mode) == nregs
 	  && targetm.hard_regno_mode_ok (regno, mode)
 	  && (!call_saved
-	      || !targetm.hard_regno_call_part_clobbered (regno, mode)))
+	      || !targetm.hard_regno_call_part_clobbered (NULL, regno, mode)))
 	return mode;
     }
 
diff --git a/gcc/regrename.c b/gcc/regrename.c
index a180ced..637b3cb 100644
--- a/gcc/regrename.c
+++ b/gcc/regrename.c
@@ -339,9 +339,9 @@ check_new_reg_p (int reg ATTRIBUTE_UNUSED, int new_reg,
 	 && ! DEBUG_INSN_P (tmp->insn))
 	|| (this_head->need_caller_save_reg
 	    && ! (targetm.hard_regno_call_part_clobbered
-		  (reg, GET_MODE (*tmp->loc)))
+		  (NULL, reg, GET_MODE (*tmp->loc)))
 	    && (targetm.hard_regno_call_part_clobbered
-		(new_reg, GET_MODE (*tmp->loc)))))
+		(NULL, new_reg, GET_MODE (*tmp->loc)))))
       return false;
 
   return true;
diff --git a/gcc/reload.c b/gcc/reload.c
index 3ad11a8..72cc38a 100644
--- a/gcc/reload.c
+++ b/gcc/reload.c
@@ -6912,13 +6912,14 @@ find_equiv_reg (rtx goal, rtx_insn *insn, enum reg_class rclass, int other,
 	  if (regno >= 0 && regno < FIRST_PSEUDO_REGISTER)
 	    for (i = 0; i < nregs; ++i)
 	      if (call_used_regs[regno + i]
-		  || targetm.hard_regno_call_part_clobbered (regno + i, mode))
+		  || targetm.hard_regno_call_part_clobbered (NULL, regno + i,
+							     mode))
 		return 0;
 
 	  if (valueno >= 0 && valueno < FIRST_PSEUDO_REGISTER)
 	    for (i = 0; i < valuenregs; ++i)
 	      if (call_used_regs[valueno + i]
-		  || targetm.hard_regno_call_part_clobbered (valueno + i,
+		  || targetm.hard_regno_call_part_clobbered (NULL, valueno + i,
 							     mode))
 		return 0;
 	}
diff --git a/gcc/reload1.c b/gcc/reload1.c
index 42012e4..bb112d8 100644
--- a/gcc/reload1.c
+++ b/gcc/reload1.c
@@ -8289,7 +8289,8 @@ emit_reload_insns (struct insn_chain *chain)
 			   : out_regno + k);
 		      reg_reloaded_insn[regno + k] = insn;
 		      SET_HARD_REG_BIT (reg_reloaded_valid, regno + k);
-		      if (targetm.hard_regno_call_part_clobbered (regno + k,
+		      if (targetm.hard_regno_call_part_clobbered (NULL,
+								  regno + k,
 								  mode))
 			SET_HARD_REG_BIT (reg_reloaded_call_part_clobbered,
 					  regno + k);
@@ -8369,7 +8370,8 @@ emit_reload_insns (struct insn_chain *chain)
 			   : in_regno + k);
 		      reg_reloaded_insn[regno + k] = insn;
 		      SET_HARD_REG_BIT (reg_reloaded_valid, regno + k);
-		      if (targetm.hard_regno_call_part_clobbered (regno + k,
+		      if (targetm.hard_regno_call_part_clobbered (NULL,
+								  regno + k,
 								  mode))
 			SET_HARD_REG_BIT (reg_reloaded_call_part_clobbered,
 					  regno + k);
@@ -8485,7 +8487,7 @@ emit_reload_insns (struct insn_chain *chain)
 		      CLEAR_HARD_REG_BIT (reg_reloaded_dead, src_regno + k);
 		      SET_HARD_REG_BIT (reg_reloaded_valid, src_regno + k);
 		      if (targetm.hard_regno_call_part_clobbered
-			  (src_regno + k, mode))
+			  (NULL, src_regno + k, mode))
 			SET_HARD_REG_BIT (reg_reloaded_call_part_clobbered,
 					  src_regno + k);
 		      else
diff --git a/gcc/sched-deps.c b/gcc/sched-deps.c
index a9e934d..6cf4caf 100644
--- a/gcc/sched-deps.c
+++ b/gcc/sched-deps.c
@@ -3728,7 +3728,7 @@ deps_analyze_insn (struct deps_desc *deps, rtx_insn *insn)
              Since we only have a choice between 'might be clobbered'
              and 'definitely not clobbered', we must include all
              partly call-clobbered registers here.  */
-	    else if (targetm.hard_regno_call_part_clobbered (i,
+	    else if (targetm.hard_regno_call_part_clobbered (insn, i,
 							     reg_raw_mode[i])
                      || TEST_HARD_REG_BIT (regs_invalidated_by_call, i))
               SET_REGNO_REG_SET (reg_pending_clobbers, i);
diff --git a/gcc/sel-sched.c b/gcc/sel-sched.c
index bf4b2dd..315f2c0 100644
--- a/gcc/sel-sched.c
+++ b/gcc/sel-sched.c
@@ -1102,7 +1102,7 @@ init_regs_for_mode (machine_mode mode)
       if (i >= 0)
         continue;
 
-      if (targetm.hard_regno_call_part_clobbered (cur_reg, mode))
+      if (targetm.hard_regno_call_part_clobbered (NULL, cur_reg, mode))
         SET_HARD_REG_BIT (sel_hrd.regs_for_call_clobbered[mode],
                           cur_reg);
 
@@ -1251,7 +1251,7 @@ mark_unavailable_hard_regs (def_t def, struct reg_rename *reg_rename_p,
 
   /* Exclude registers that are partially call clobbered.  */
   if (def->crosses_call
-      && !targetm.hard_regno_call_part_clobbered (regno, mode))
+      && !targetm.hard_regno_call_part_clobbered (NULL, regno, mode))
     AND_COMPL_HARD_REG_SET (reg_rename_p->available_for_renaming,
                             sel_hrd.regs_for_call_clobbered[mode]);
 
diff --git a/gcc/target.def b/gcc/target.def
index 2aeb1ff..7ebc90b 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -5766,14 +5766,30 @@ DEFHOOK
 (hard_regno_call_part_clobbered,
  "This hook should return true if @var{regno} is partly call-saved and\n\
 partly call-clobbered, and if a value of mode @var{mode} would be partly\n\
-clobbered by a call.  For example, if the low 32 bits of @var{regno} are\n\
-preserved across a call but higher bits are clobbered, this hook should\n\
-return true for a 64-bit mode but false for a 32-bit mode.\n\
+clobbered by call instruction @var{insn}.  If @var{insn} is NULL then it\n\
+should return true if any call could partly clobber the register.\n\
+For example, if the low 32 bits of @var{regno} are preserved across a call\n\
+but higher bits are clobbered, this hook should return true for a 64-bit\n\
+mode but false for a 32-bit mode.\n\
 \n\
 The default implementation returns false, which is correct\n\
 for targets that don't have partly call-clobbered registers.",
- bool, (unsigned int regno, machine_mode mode),
- hook_bool_uint_mode_false)
+ bool, (rtx_insn *insn, unsigned int regno, machine_mode mode),
+ hook_bool_insn_uint_mode_false)
+
+DEFHOOK
+(return_call_with_max_clobbers,
+ "This hook returns a pointer to the call that partially clobbers the\n\
+most registers.  If a platform supports multiple ABIs where the registers\n\
+that are partially clobbered may vary, this function compares two\n\
+calls and returns a pointer to the one that clobbers the most registers.\n\
+If both calls clobber the same registers, @var{call_1} must be returned.\n\
+\n\
+The registers clobbered in different ABIs must be a proper subset or\n\
+superset of all other ABIs.  @var{call_1} must always be a call insn,\n\
+call_2 may be NULL or a call insn.",
+ rtx_insn *, (rtx_insn *call_1, rtx_insn *call_2),
+ NULL)
 
 /* Return the smallest number of different values for which it is best to
    use a jump-table instead of a tree of conditional branches.  */
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 898848f..2cbdc4a 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -1930,7 +1930,7 @@ default_dwarf_frame_reg_mode (int regno)
 {
   machine_mode save_mode = reg_raw_mode[regno];
 
-  if (targetm.hard_regno_call_part_clobbered (regno, save_mode))
+  if (targetm.hard_regno_call_part_clobbered (NULL, regno, save_mode))
     save_mode = choose_hard_reg_mode (regno, 1, true);
   return save_mode;
 }
Richard Sandiford Jan. 11, 2019, 11:22 a.m. | #9
Steve Ellcey <sellcey@marvell.com> writes:
> OK, I fixed the issues in your last email.  I initially found one
> regression while testing.  In lra_create_live_ranges_1 I had removed
> the 'call_p = false' statement but did not replaced it with anything.
> This resulted in no regressions on aarch64 but caused a single
> regression on x86 (gcc.target/i386/pr87759.c).  I replaced the
> line with 'call_insn = NULL' and the regression went away so I
> have clean bootstraps and no regressions on aarch64 and x86 now.

Looks good to me bar the parameter description below.

> If this looks good to you can I go ahead and check it in?  I know
> we are in Stage 3 now, but my recollection is that patches that were
> initially submitted during Stage 1 could go ahead once approved.

Yeah, like you say, this was originally posted in stage 1 and is the
last patch in the series.  Not committing it would leave the work
incomplete in GCC 9.  The problem is that we're now in stage 4 rather
than stage 3.

I don't think I'm neutral enough to make the call.  Richard, Jakub?

> diff --git a/gcc/lra-lives.c b/gcc/lra-lives.c
> index a00ec38..b77b675 100644
> --- a/gcc/lra-lives.c
> +++ b/gcc/lra-lives.c
> @@ -576,25 +576,39 @@ lra_setup_reload_pseudo_preferenced_hard_reg (int regno,
>  
>  /* Check that REGNO living through calls and setjumps, set up conflict
>     regs using LAST_CALL_USED_REG_SET, and clear corresponding bits in
> -   PSEUDOS_LIVE_THROUGH_CALLS and PSEUDOS_LIVE_THROUGH_SETJUMPS.  */
> +   PSEUDOS_LIVE_THROUGH_CALLS and PSEUDOS_LIVE_THROUGH_SETJUMPS.
> +   CALL_INSN may be the specific call we want to check that REGNO lives
> +   through or a call that is guaranteed to clobber REGNO if any call
> +   in the current block clobbers REGNO.  */

I think it would be more accurate to say:

   CALL_INSN is a call that is representative of all calls in the region
   described by the PSEUDOS_LIVE_THROUGH_* sets, in terms of the registers
   that it preserves and clobbers.  */

> +
>  static inline void
>  check_pseudos_live_through_calls (int regno,
> -				  HARD_REG_SET last_call_used_reg_set)
> +				  HARD_REG_SET last_call_used_reg_set,
> +				  rtx_insn *call_insn)
>  {
>    int hr;
> +  rtx_insn *old_call_insn;
>  
>    if (! sparseset_bit_p (pseudos_live_through_calls, regno))
>      return;
> +
> +  gcc_assert (call_insn && CALL_P (call_insn));
> +  old_call_insn = lra_reg_info[regno].call_insn;
> +  if (!old_call_insn
> +      || (targetm.return_call_with_max_clobbers
> +	  && targetm.return_call_with_max_clobbers (old_call_insn, call_insn)
> +	     == call_insn))
> +    lra_reg_info[regno].call_insn = call_insn;
> +
>    sparseset_clear_bit (pseudos_live_through_calls, regno);
>    IOR_HARD_REG_SET (lra_reg_info[regno].conflict_hard_regs,
>  		    last_call_used_reg_set);
>  
>    for (hr = 0; HARD_REGISTER_NUM_P (hr); hr++)
> -    if (targetm.hard_regno_call_part_clobbered (hr,
> +    if (targetm.hard_regno_call_part_clobbered (call_insn, hr,
>  						PSEUDO_REGNO_MODE (regno)))
>        add_to_hard_reg_set (&lra_reg_info[regno].conflict_hard_regs,
>  			   PSEUDO_REGNO_MODE (regno), hr);
> -  lra_reg_info[regno].call_p = true;
>    if (! sparseset_bit_p (pseudos_live_through_setjumps, regno))
>      return;
>    sparseset_clear_bit (pseudos_live_through_setjumps, regno);

BTW, I think we could save some compile time by moving the "for" loop
into the new "if", since otherwise call_insn should have no new
information.  But that was true before as well (since we could have
skipped the loop if lra_reg_info[regno].call_p was already true),
so it's really a separate issue.

Thanks,
Richard
Jakub Jelinek Jan. 11, 2019, 11:30 a.m. | #10
On Fri, Jan 11, 2019 at 11:22:59AM +0000, Richard Sandiford wrote:
> Steve Ellcey <sellcey@marvell.com> writes:
> > If this looks good to you can I go ahead and check it in?  I know
> > we are in Stage 3 now, but my recollection is that patches that were
> > initially submitted during Stage 1 could go ahead once approved.
> 
> Yeah, like you say, this was originally posted in stage 1 and is the
> last patch in the series.  Not committing it would leave the work
> incomplete in GCC 9.  The problem is that we're now in stage 4 rather
> than stage 3.
> 
> I don't think I'm neutral enough to make the call.  Richard, Jakub?

I'm ok with accepting this late as an exception, if it can be committed
reasonably soon (within a week or at most two).

	Jakub
Steve Ellcey Jan. 11, 2019, 2:02 p.m. | #11
On Fri, 2019-01-11 at 12:30 +0100, Jakub Jelinek wrote:
> 
> > Yeah, like you say, this was originally posted in stage 1 and is the
> > last patch in the series.  Not committing it would leave the work
> > incomplete in GCC 9.  The problem is that we're now in stage 4 rather
> > than stage 3.
> > 
> > I don't think I'm neutral enough to make the call.  Richard, Jakub?
> 
> I'm ok with accepting this late as an exception, if it can be committed
> reasonably soon (within a week or at most two).
> 
> 	Jakub

OK, I will make the comment change and check it in.  Note that this
does not give us the complete implementation yet.  Patch 3/4 was OK'ed
by Richard last week but I hadn't checked that in either.  So I will
check both these in later today unless I haer otherwise.

That still leaves Patch 2/4 which is Aarch64 specific.

https://gcc.gnu.org/ml/gcc-patches/2018-12/msg01421.html

Jakub had some comments on the test changes which I fixed but I did
not get any feedback on the actual code changes so I am not sure if 
that is OK or not.

STeve Ellcey
sellcey@marvell.com

Patch

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index c82c7b6..c2de4111 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1480,6 +1480,17 @@  aarch64_hard_regno_call_part_clobbered (unsigned int regno, machine_mode mode)
   return FP_REGNUM_P (regno) && maybe_gt (GET_MODE_SIZE (mode), 8);
 }
 
+/* Implement TARGET_CHECK_PART_CLOBBERED.  SIMD functions never save
+   partial registers, so they return false.  */
+
+static bool
+aarch64_check_part_clobbered(rtx_insn *insn)
+{
+  if (aarch64_simd_call_p (insn))
+    return false;
+  return true;
+}
+
 /* Implement REGMODE_NATURAL_SIZE.  */
 poly_uint64
 aarch64_regmode_natural_size (machine_mode mode)
@@ -18294,6 +18305,9 @@  aarch64_libgcc_floating_mode_supported_p
 #define TARGET_HARD_REGNO_CALL_PART_CLOBBERED \
   aarch64_hard_regno_call_part_clobbered
 
+#undef TARGET_CHECK_PART_CLOBBERED
+#define TARGET_CHECK_PART_CLOBBERED aarch64_check_part_clobbered
+
 #undef TARGET_CONSTANT_ALIGNMENT
 #define TARGET_CONSTANT_ALIGNMENT aarch64_constant_alignment
 
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index e8af1bf..7dd6c54 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -1704,6 +1704,8 @@  of @code{CALL_USED_REGISTERS}.
 @cindex call-saved register
 @hook TARGET_HARD_REGNO_CALL_PART_CLOBBERED
 
+@hook TARGET_CHECK_PART_CLOBBERED
+
 @findex fixed_regs
 @findex call_used_regs
 @findex global_regs
diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index ab61989..89483d3 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -5325,16 +5325,23 @@  inherit_reload_reg (bool def_p, int original_regno,
 static inline bool
 need_for_call_save_p (int regno)
 {
+  machine_mode pmode = PSEUDO_REGNO_MODE (regno);
+  int new_regno = reg_renumber[regno];
+
   lra_assert (regno >= FIRST_PSEUDO_REGISTER && reg_renumber[regno] >= 0);
-  return (usage_insns[regno].calls_num < calls_num
-	  && (overlaps_hard_reg_set_p
-	      ((flag_ipa_ra &&
-		! hard_reg_set_empty_p (lra_reg_info[regno].actual_call_used_reg_set))
-	       ? lra_reg_info[regno].actual_call_used_reg_set
-	       : call_used_reg_set,
-	       PSEUDO_REGNO_MODE (regno), reg_renumber[regno])
-	      || (targetm.hard_regno_call_part_clobbered
-		  (reg_renumber[regno], PSEUDO_REGNO_MODE (regno)))));
+
+  if (usage_insns[regno].calls_num >= calls_num)
+    return false;
+
+  if (flag_ipa_ra
+      && !hard_reg_set_empty_p (lra_reg_info[regno].actual_call_used_reg_set))
+    return (overlaps_hard_reg_set_p
+		(lra_reg_info[regno].actual_call_used_reg_set, pmode, new_regno)
+	    || (lra_reg_info[regno].check_part_clobbered
+		&& targetm.hard_regno_call_part_clobbered (new_regno, pmode)));
+  else
+    return (overlaps_hard_reg_set_p (call_used_reg_set, pmode, new_regno)
+            || targetm.hard_regno_call_part_clobbered (new_regno, pmode));
 }
 
 /* Global registers occurring in the current EBB.  */
diff --git a/gcc/lra-int.h b/gcc/lra-int.h
index 5267b53..e6aacd2 100644
--- a/gcc/lra-int.h
+++ b/gcc/lra-int.h
@@ -117,6 +117,8 @@  struct lra_reg
   /* This member is set up in lra-lives.c for subsequent
      assignments.  */
   lra_copy_t copies;
+  /* Whether or not the register is partially clobbered.  */
+  bool check_part_clobbered;
 };
 
 /* References to the common info about each register.  */
diff --git a/gcc/lra-lives.c b/gcc/lra-lives.c
index 0bf8cd0..b2dfe0e 100644
--- a/gcc/lra-lives.c
+++ b/gcc/lra-lives.c
@@ -597,7 +597,8 @@  lra_setup_reload_pseudo_preferenced_hard_reg (int regno,
    PSEUDOS_LIVE_THROUGH_CALLS and PSEUDOS_LIVE_THROUGH_SETJUMPS.  */
 static inline void
 check_pseudos_live_through_calls (int regno,
-				  HARD_REG_SET last_call_used_reg_set)
+				  HARD_REG_SET last_call_used_reg_set,
+				  bool check_partial_clobber)
 {
   int hr;
 
@@ -607,11 +608,12 @@  check_pseudos_live_through_calls (int regno,
   IOR_HARD_REG_SET (lra_reg_info[regno].conflict_hard_regs,
 		    last_call_used_reg_set);
 
-  for (hr = 0; hr < FIRST_PSEUDO_REGISTER; hr++)
-    if (targetm.hard_regno_call_part_clobbered (hr,
-						PSEUDO_REGNO_MODE (regno)))
-      add_to_hard_reg_set (&lra_reg_info[regno].conflict_hard_regs,
-			   PSEUDO_REGNO_MODE (regno), hr);
+  if (check_partial_clobber)
+    for (hr = 0; hr < FIRST_PSEUDO_REGISTER; hr++)
+      if (targetm.hard_regno_call_part_clobbered (hr,
+						  PSEUDO_REGNO_MODE (regno)))
+        add_to_hard_reg_set (&lra_reg_info[regno].conflict_hard_regs,
+			     PSEUDO_REGNO_MODE (regno), hr);
   lra_reg_info[regno].call_p = true;
   if (! sparseset_bit_p (pseudos_live_through_setjumps, regno))
     return;
@@ -652,6 +654,7 @@  process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
   rtx_insn *next;
   rtx link, *link_loc;
   bool need_curr_point_incr;
+  bool partial_clobber_in_bb;
   HARD_REG_SET last_call_used_reg_set;
   
   reg_live_out = df_get_live_out (bb);
@@ -673,6 +676,18 @@  process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
   if (lra_dump_file != NULL)
     fprintf (lra_dump_file, "  BB %d\n", bb->index);
 
+  /* Check to see if any call might do a partial clobber.  */
+  partial_clobber_in_bb = false;
+  FOR_BB_INSNS_REVERSE_SAFE (bb, curr_insn, next)
+    {
+      if (CALL_P (curr_insn)
+          && targetm.check_part_clobbered (curr_insn))
+        {
+          partial_clobber_in_bb = true;
+          break;
+        }
+    }
+
   /* Scan the code of this basic block, noting which pseudos and hard
      regs are born or die.
 
@@ -850,7 +865,8 @@  process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
 		|= mark_regno_live (reg->regno, reg->biggest_mode,
 				    curr_point);
 	      check_pseudos_live_through_calls (reg->regno,
-						last_call_used_reg_set);
+						last_call_used_reg_set,
+						partial_clobber_in_bb);
 	    }
 
 	  if (reg->regno >= FIRST_PSEUDO_REGISTER)
@@ -913,9 +929,14 @@  process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
 		{
 		  IOR_HARD_REG_SET (lra_reg_info[j].actual_call_used_reg_set,
 				    this_call_used_reg_set);
+
+		  if (targetm.check_part_clobbered (curr_insn))
+		    lra_reg_info[j].check_part_clobbered = true;
+
 		  if (flush)
-		    check_pseudos_live_through_calls
-		      (j, last_call_used_reg_set);
+		    check_pseudos_live_through_calls (j,
+						      last_call_used_reg_set,
+						      partial_clobber_in_bb);
 		}
 	      COPY_HARD_REG_SET(last_call_used_reg_set, this_call_used_reg_set);
 	    }
@@ -946,7 +967,8 @@  process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
 	      |= mark_regno_live (reg->regno, reg->biggest_mode,
 				  curr_point);
 	    check_pseudos_live_through_calls (reg->regno,
-					      last_call_used_reg_set);
+					      last_call_used_reg_set,
+					      partial_clobber_in_bb);
 	  }
 
       for (reg = curr_static_id->hard_regs; reg != NULL; reg = reg->next)
@@ -1102,7 +1124,9 @@  process_bb_lives (basic_block bb, int &curr_point, bool dead_insn_p)
       if (sparseset_cardinality (pseudos_live_through_calls) == 0)
 	break;
       if (sparseset_bit_p (pseudos_live_through_calls, j))
-	check_pseudos_live_through_calls (j, last_call_used_reg_set);
+	check_pseudos_live_through_calls (j,
+					  last_call_used_reg_set,
+					  partial_clobber_in_bb);
     }
 
   for (i = 0; i < FIRST_PSEUDO_REGISTER; ++i)
diff --git a/gcc/lra.c b/gcc/lra.c
index 5d58d90..8831286 100644
--- a/gcc/lra.c
+++ b/gcc/lra.c
@@ -1344,6 +1344,7 @@  initialize_lra_reg_info_element (int i)
   lra_reg_info[i].val = get_new_reg_value ();
   lra_reg_info[i].offset = 0;
   lra_reg_info[i].copies = NULL;
+  lra_reg_info[i].check_part_clobbered = false;
 }
 
 /* Initialize common reg info and copies.  */
diff --git a/gcc/target.def b/gcc/target.def
index 4b166d1..b3c2c72 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -5757,6 +5757,15 @@  for targets that don't have partly call-clobbered registers.",
  bool, (unsigned int regno, machine_mode mode),
  hook_bool_uint_mode_false)
 
+DEFHOOK
+(
+ check_part_clobbered,
+ "This hook should return true if the function @var{insn} must obey\n\
+ the hard_regno_call_part_clobbered target function.  False if can ignore\n\
+ it because we know the function will not partially clobber any registers.",
+ bool, (rtx_insn *insn),
+ hook_bool_rtx_insn_true)
+
 /* Return the smallest number of different values for which it is best to
    use a jump-table instead of a tree of conditional branches.  */
 DEFHOOK