Patchwork [IRA] Analysis of register usage of functions for usage by IRA.

login
register
mail settings
Submitter Tom de Vries
Date Dec. 6, 2013, 12:47 a.m.
Message ID <52A11E8E.8090103@mentor.com>
Download mbox | patch
Permalink /patch/297501/
State New
Headers show

Comments

Tom de Vries - Dec. 6, 2013, 12:47 a.m.
On 14-03-13 10:34, Tom de Vries wrote:
>> I thought about implementing your optimization for LRA by myself. But it
>> >is ok if you decide to work on it.  At least, I am not going to start
>> >this work for a month.
>>> >>I'm also currently looking at how to use the analysis in LRA.
>>> >>AFAIU, in lra-constraints.c we do a backward scan over the insns, and keep track
>>> >>of how many calls we've seen (calls_num), and mark insns with that number. Then
>>> >>when looking at a live-range segment consisting of a def or use insn a and a
>>> >>following use insn b, we can compare the number of calls seen for each insn, and
>>> >>if they're not equal there is at least one call between the 2 insns, and if the
>>> >>corresponding hard register is clobbered by calls, we spill after insn a and
>>> >>restore before insn b.
>>> >>
>>> >>That is too coarse-grained to use with our analysis, since we need to know which
>>> >>calls occur in between insn a and insn b, and more precisely which registers
>>> >>those calls clobbered.
>> >
>>> >>I wonder though if we can do something similar: we keep an array
>>> >>call_clobbers_num[FIRST_PSEUDO_REG], initialized at 0 when we start scanning.
>>> >>When encountering a call, we increase the call_clobbers_num entries for the hard
>>> >>registers clobbered by the call.
>>> >>When encountering a use, we set the call_clobbers_num field of the use to
>>> >>call_clobbers_num[reg_renumber[original_regno]].
>>> >>And when looking at a live-range segment, we compare the clobbers_num field of
>>> >>insn a and insn b, and if it is not equal, the hard register was clobbered by at
>>> >>least one call between insn a and insn b.
>>> >>Would that work? WDYT?
>>> >>
>> >As I understand you looked at live-range splitting code in
>> >lra-constraints.c.  To get necessary info you should look at ira-lives.c.
> Unfortunately I haven't been able to find time to work further on the LRA part.
> So if you're still willing to pick up that part, that would be great.

Vladimir,

I gave this a try. The attached patch works for the included test-case for x86_64.

I've bootstrapped and reg-tested the patch (in combination with the other 
patches from the series) on x86_64.

OK for stage1?

Thanks,
- Tom
Vladimir Makarov - Jan. 14, 2014, 7:36 p.m.
On 12/05/2013 07:47 PM, Tom de Vries wrote:
> On 14-03-13 10:34, Tom de Vries wrote:
>>> I thought about implementing your optimization for LRA by myself.
>>> But it
>>> >is ok if you decide to work on it.  At least, I am not going to start
>>> >this work for a month.
>>>> >>I'm also currently looking at how to use the analysis in LRA.
>>>> >>AFAIU, in lra-constraints.c we do a backward scan over the insns,
>>>> and keep track
>>>> >>of how many calls we've seen (calls_num), and mark insns with
>>>> that number. Then
>>>> >>when looking at a live-range segment consisting of a def or use
>>>> insn a and a
>>>> >>following use insn b, we can compare the number of calls seen for
>>>> each insn, and
>>>> >>if they're not equal there is at least one call between the 2
>>>> insns, and if the
>>>> >>corresponding hard register is clobbered by calls, we spill after
>>>> insn a and
>>>> >>restore before insn b.
>>>> >>
>>>> >>That is too coarse-grained to use with our analysis, since we
>>>> need to know which
>>>> >>calls occur in between insn a and insn b, and more precisely
>>>> which registers
>>>> >>those calls clobbered.
>>> >
>>>> >>I wonder though if we can do something similar: we keep an array
>>>> >>call_clobbers_num[FIRST_PSEUDO_REG], initialized at 0 when we
>>>> start scanning.
>>>> >>When encountering a call, we increase the call_clobbers_num
>>>> entries for the hard
>>>> >>registers clobbered by the call.
>>>> >>When encountering a use, we set the call_clobbers_num field of
>>>> the use to
>>>> >>call_clobbers_num[reg_renumber[original_regno]].
>>>> >>And when looking at a live-range segment, we compare the
>>>> clobbers_num field of
>>>> >>insn a and insn b, and if it is not equal, the hard register was
>>>> clobbered by at
>>>> >>least one call between insn a and insn b.
>>>> >>Would that work? WDYT?
>>>> >>
>>> >As I understand you looked at live-range splitting code in
>>> >lra-constraints.c.  To get necessary info you should look at
>>> ira-lives.c.
>> Unfortunately I haven't been able to find time to work further on the
>> LRA part.
>> So if you're still willing to pick up that part, that would be great.
>
> Vladimir,
>
> I gave this a try. The attached patch works for the included test-case
> for x86_64.
>
> I've bootstrapped and reg-tested the patch (in combination with the
> other patches from the series) on x86_64.
>
> OK for stage1?
>
Yes, it is ok for stage1.  Thanks for not forgetting LRA and sorry for
the delay with the answer (it is not a high priority patch for me right
now).

I believe, this patch helps to improve code also because of better
spilling into SSE regs.  Spilling into SSE regs instead of memory has a
rare probability right now as all SSE regs are call clobbered.

Thanks again, Tom.

Patch

2013-12-04  Tom de Vries  <tom@codesourcery.com>

	* lra-int.h (struct lra_reg): Add field actual_call_used_reg_set.
	* lra.c (initialize_lra_reg_info_element): Add init of
	actual_call_used_reg_set field.
	(lra): Call lra_create_live_ranges before lra_inheritance for
	-fuse-caller-save.
	* lra-assigns.c (lra_assign): Allow call_used_regs to cross calls for
	-fuse-caller-save.
	* lra-constraints.c (need_for_call_save_p): Use actual_call_used_reg_set
	instead of call_used_reg_set for -fuse-caller-save.
	* lra-lives.c (process_bb_lives): Calculate actual_call_used_reg_set.

	* gcc.target/i386/fuse-caller-save.c: New test.
	* gcc.dg/ira-shrinkwrap-prep-1.c: Run with -fno-use-caller-save.

diff --git a/gcc/lra-assigns.c b/gcc/lra-assigns.c
index 88fc693..943b349 100644
--- a/gcc/lra-assigns.c
+++ b/gcc/lra-assigns.c
@@ -1413,6 +1413,7 @@  lra_assign (void)
   bitmap_head insns_to_process;
   bool no_spills_p;
   int max_regno = max_reg_num ();
+  unsigned int call_used_reg_crosses_call = 0;
 
   timevar_push (TV_LRA_ASSIGN);
   init_lives ();
@@ -1425,14 +1426,22 @@  lra_assign (void)
   bitmap_initialize (&all_spilled_pseudos, &reg_obstack);
   create_live_range_start_chains ();
   setup_live_pseudos_and_spill_after_risky_transforms (&all_spilled_pseudos);
-#ifdef ENABLE_CHECKING
   for (i = FIRST_PSEUDO_REGISTER; i < max_regno; i++)
     if (lra_reg_info[i].nrefs != 0 && reg_renumber[i] >= 0
 	&& lra_reg_info[i].call_p
 	&& overlaps_hard_reg_set_p (call_used_reg_set,
 				    PSEUDO_REGNO_MODE (i), reg_renumber[i]))
-      gcc_unreachable ();
-#endif
+      {
+	if (!flag_use_caller_save)
+	  gcc_unreachable ();
+	call_used_reg_crosses_call++;
+      }
+  if (lra_dump_file
+      && call_used_reg_crosses_call > 0)
+    fprintf (lra_dump_file,
+	     "Found %u pseudo(s) with a call used reg crossing a call.\n"
+	     "Allowing due to -fuse-caller-save\n",
+	     call_used_reg_crosses_call);    
   /* Setup insns to process on the next constraint pass.  */
   bitmap_initialize (&changed_pseudo_bitmap, &reg_obstack);
   init_live_reload_and_inheritance_pseudos ();
diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index bb5242a..d0939dc 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -4438,7 +4438,10 @@  need_for_call_save_p (int regno)
   lra_assert (regno >= FIRST_PSEUDO_REGISTER && reg_renumber[regno] >= 0);
   return (usage_insns[regno].calls_num < calls_num
 	  && (overlaps_hard_reg_set_p
-	      (call_used_reg_set,
+	      ((flag_use_caller_save &&
+		! hard_reg_set_empty_p (lra_reg_info[regno].actual_call_used_reg_set))
+	       ? lra_reg_info[regno].actual_call_used_reg_set
+	       : call_used_reg_set,
 	       PSEUDO_REGNO_MODE (regno), reg_renumber[regno])
 	      || HARD_REGNO_CALL_PART_CLOBBERED (reg_renumber[regno],
 						 PSEUDO_REGNO_MODE (regno))));
diff --git a/gcc/lra-int.h b/gcc/lra-int.h
index 6d8d80f..f2b8079 100644
--- a/gcc/lra-int.h
+++ b/gcc/lra-int.h
@@ -77,6 +77,10 @@  struct lra_reg
   /* The following fields are defined only for pseudos.	 */
   /* Hard registers with which the pseudo conflicts.  */
   HARD_REG_SET conflict_hard_regs;
+  /* Call used registers with which the pseudo conflicts, taking into account
+     the registers used by functions called from calls which cross the
+     pseudo. */
+  HARD_REG_SET actual_call_used_reg_set;
   /* We assign hard registers to reload pseudos which can occur in few
      places.  So two hard register preferences are enough for them.
      The following fields define the preferred hard registers.	If
diff --git a/gcc/lra-lives.c b/gcc/lra-lives.c
index efc19f2..774d6c2 100644
--- a/gcc/lra-lives.c
+++ b/gcc/lra-lives.c
@@ -624,6 +624,17 @@  process_bb_lives (basic_block bb, int &curr_point)
 
       if (call_p)
 	{
+	  if (flag_use_caller_save)
+	    {
+	      HARD_REG_SET this_call_used_reg_set;
+	      get_call_reg_set_usage (curr_insn, &this_call_used_reg_set,
+				      call_used_reg_set);
+
+	      EXECUTE_IF_SET_IN_SPARSESET (pseudos_live, j)
+		IOR_HARD_REG_SET (lra_reg_info[j].actual_call_used_reg_set,
+				  this_call_used_reg_set);
+	    }
+ 
 	  sparseset_ior (pseudos_live_through_calls,
 			 pseudos_live_through_calls, pseudos_live);
 	  if (cfun->has_nonlocal_label
diff --git a/gcc/lra.c b/gcc/lra.c
index d0d9bcb..599f95a 100644
--- a/gcc/lra.c
+++ b/gcc/lra.c
@@ -1427,6 +1427,7 @@  initialize_lra_reg_info_element (int i)
   lra_reg_info[i].no_stack_p = false;
 #endif
   CLEAR_HARD_REG_SET (lra_reg_info[i].conflict_hard_regs);
+  CLEAR_HARD_REG_SET (lra_reg_info[i].actual_call_used_reg_set);
   lra_reg_info[i].preferred_hard_regno1 = -1;
   lra_reg_info[i].preferred_hard_regno2 = -1;
   lra_reg_info[i].preferred_hard_regno_profit1 = 0;
@@ -2343,7 +2344,18 @@  lra (FILE *f)
 	  lra_eliminate (false, false);
 	  /* Do inheritance only for regular algorithms.  */
 	  if (! lra_simple_p)
-	    lra_inheritance ();
+	    {
+	      if (flag_use_caller_save)
+		{
+		  if (live_p)
+		    lra_clear_live_ranges ();
+		  /* As a side-effect of lra_create_live_ranges, we calculate
+		     actual_call_used_reg_set,  which is needed during
+		     lra_inheritance.  */
+		  lra_create_live_ranges (true);
+		}
+	      lra_inheritance ();
+	    }
 	  if (live_p)
 	    lra_clear_live_ranges ();
 	  /* We need live ranges for lra_assign -- so build them.  */
diff --git a/gcc/testsuite/gcc.dg/ira-shrinkwrap-prep-1.c b/gcc/testsuite/gcc.dg/ira-shrinkwrap-prep-1.c
index 54d3e76..a386fab 100644
--- a/gcc/testsuite/gcc.dg/ira-shrinkwrap-prep-1.c
+++ b/gcc/testsuite/gcc.dg/ira-shrinkwrap-prep-1.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile { target { { x86_64-*-* && lp64 } || { powerpc*-*-* && lp64 } } } } */
-/* { dg-options "-O3 -fdump-rtl-ira -fdump-rtl-pro_and_epilogue"  } */
+/* { dg-options "-O3 -fdump-rtl-ira -fdump-rtl-pro_and_epilogue -fno-use-caller-save"  } */
 
 long __attribute__((noinline, noclone))
 foo (long a)
diff --git a/gcc/testsuite/gcc.target/i386/fuse-caller-save.c b/gcc/testsuite/gcc.target/i386/fuse-caller-save.c
new file mode 100644
index 0000000..c5d620c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/fuse-caller-save.c
@@ -0,0 +1,26 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fuse-caller-save -fdump-rtl-reload" } */
+/* Testing -fuse-caller-save optimization option.  */
+
+static int __attribute__((noinline))
+bar (int x)
+{
+  return x + 3;
+}
+
+int __attribute__((noinline))
+foo (int y)
+{
+  return y + bar (y);
+}
+
+int
+main (void)
+{
+  return !(foo (5) == 13);
+}
+
+/* { dg-final { scan-rtl-dump-times "Found 1 pseudo.* with a call used reg crossing a call" 1 "reload" } } */
+/* { dg-final { scan-rtl-dump-times "Found .* pseudo.* with a call used reg crossing a call" 1 "reload" } } */
+/* { dg-final { scan-rtl-dump-times "Allowing due to -fuse-caller-save" 1 "reload" } } */
+/* { dg-final { cleanup-rtl-dump "reload" } } */