diff mbox

a patch for PR68695

Message ID 56FAAD47.8090300@redhat.com
State New
Headers show

Commit Message

Vladimir Makarov March 29, 2016, 4:28 p.m. UTC
The following patch improves the code in 2 out of 3 cases in

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68695

   The patch uses more accurate costs for the RA cost improvement 
optimization after colouring.

   The patch was tested and bootstrapped on x86-64.  It is hard to 
create a  test to check the correct code generation.  Therefore there is 
no test.  As the patch changes heuristics, a generated code in some 
cases will be different but at least x86-64 tests expecting a specific 
code are not broken by the patch.

   Committed as rev.  234527

Comments

Christophe Lyon March 30, 2016, 9:23 p.m. UTC | #1
On 29 March 2016 at 18:28, Vladimir Makarov <vmakarov@redhat.com> wrote:
>   The following patch improves the code in 2 out of 3 cases in
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68695
>
>   The patch uses more accurate costs for the RA cost improvement
> optimization after colouring.
>
>   The patch was tested and bootstrapped on x86-64.  It is hard to create a
> test to check the correct code generation.  Therefore there is no test.  As
> the patch changes heuristics, a generated code in some cases will be
> different but at least x86-64 tests expecting a specific code are not broken
> by the patch.
>
>   Committed as rev.  234527
>

Hi,

I've noticed that after this patch, 2 tests regress (PASS -> FAIL) on arm:
  gcc.dg/ira-shrinkwrap-prep-2.c scan-rtl-dump pro_and_epilogue
"Performing shrink-wrapping"
  gcc.dg/pr10474.c scan-rtl-dump pro_and_epilogue "Performing shrink-wrapping"

Christophe
Vladimir Makarov April 1, 2016, 5:45 p.m. UTC | #2
On 03/30/2016 05:23 PM, Christophe Lyon wrote:
> On 29 March 2016 at 18:28, Vladimir Makarov <vmakarov@redhat.com> wrote:
>>    The following patch improves the code in 2 out of 3 cases in
>>
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68695
>>
>>    The patch uses more accurate costs for the RA cost improvement
>> optimization after colouring.
>>
>>    The patch was tested and bootstrapped on x86-64.  It is hard to create a
>> test to check the correct code generation.  Therefore there is no test.  As
>> the patch changes heuristics, a generated code in some cases will be
>> different but at least x86-64 tests expecting a specific code are not broken
>> by the patch.
>>
>>    Committed as rev.  234527
>>
> Hi,
>
> I've noticed that after this patch, 2 tests regress (PASS -> FAIL) on arm:
>    gcc.dg/ira-shrinkwrap-prep-2.c scan-rtl-dump pro_and_epilogue
> "Performing shrink-wrapping"
>    gcc.dg/pr10474.c scan-rtl-dump pro_and_epilogue "Performing shrink-wrapping"
>
>
Thanks, Christophe.  I am having them too on the today trunk.  I'll 
investigate this more.
Vladimir Makarov April 1, 2016, 8:26 p.m. UTC | #3
On 03/30/2016 05:23 PM, Christophe Lyon wrote:
> On 29 March 2016 at 18:28, Vladimir Makarov <vmakarov@redhat.com> wrote:
>>    The following patch improves the code in 2 out of 3 cases in
>>
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68695
>>
>>    The patch uses more accurate costs for the RA cost improvement
>> optimization after colouring.
>>
>>    The patch was tested and bootstrapped on x86-64.  It is hard to create a
>> test to check the correct code generation.  Therefore there is no test.  As
>> the patch changes heuristics, a generated code in some cases will be
>> different but at least x86-64 tests expecting a specific code are not broken
>> by the patch.
>>
>>    Committed as rev.  234527
>>
> Hi,
>
> I've noticed that after this patch, 2 tests regress (PASS -> FAIL) on arm:
>    gcc.dg/ira-shrinkwrap-prep-2.c scan-rtl-dump pro_and_epilogue
> "Performing shrink-wrapping"
>    gcc.dg/pr10474.c scan-rtl-dump pro_and_epilogue "Performing shrink-wrapping"
>

I've checked the generated code.  RA with the patch generates a better 
code for the both tests. So shrink wrap optimization failed. The final 
code has 1 insn less for the both tests when the patch is applied.

I guess it is wrong to write quality tests based on expected code 
generated before any optimization.  It has sense if we provide the same 
input.  LLVM testsuite is mostly such tests as they have a readable IR.  
GCC unfortunately has no serialized and readable IR. On the other hand 
LLVM lacks integrated testing.

So I'd mark these tests as XFAIL or removed arm from DEJAGNU target in 
the tests.
Jakub Jelinek April 1, 2016, 8:43 p.m. UTC | #4
On Fri, Apr 01, 2016 at 04:26:41PM -0400, Vladimir Makarov wrote:
> >I've noticed that after this patch, 2 tests regress (PASS -> FAIL) on arm:
> >   gcc.dg/ira-shrinkwrap-prep-2.c scan-rtl-dump pro_and_epilogue
> >"Performing shrink-wrapping"
> >   gcc.dg/pr10474.c scan-rtl-dump pro_and_epilogue "Performing shrink-wrapping"
> >
> 
> I've checked the generated code.  RA with the patch generates a better code
> for the both tests. So shrink wrap optimization failed. The final code has 1
> insn less for the both tests when the patch is applied.
> 
> I guess it is wrong to write quality tests based on expected code generated
> before any optimization.  It has sense if we provide the same input.  LLVM
> testsuite is mostly such tests as they have a readable IR.  GCC
> unfortunately has no serialized and readable IR. On the other hand LLVM
> lacks integrated testing.
> 
> So I'd mark these tests as XFAIL or removed arm from DEJAGNU target in the
> tests.

FYI, those 2 tests also now FAIL on ppc64{,le}-linux in addition to
armv7hl-linux-gnueabi.

	Jakub
Kyrill Tkachov April 5, 2016, 9:48 a.m. UTC | #5
Hi all,

On 01/04/16 21:43, Jakub Jelinek wrote:
> On Fri, Apr 01, 2016 at 04:26:41PM -0400, Vladimir Makarov wrote:
>>> I've noticed that after this patch, 2 tests regress (PASS -> FAIL) on arm:
>>>    gcc.dg/ira-shrinkwrap-prep-2.c scan-rtl-dump pro_and_epilogue
>>> "Performing shrink-wrapping"
>>>    gcc.dg/pr10474.c scan-rtl-dump pro_and_epilogue "Performing shrink-wrapping"
>>>
>> I've checked the generated code.  RA with the patch generates a better code
>> for the both tests. So shrink wrap optimization failed. The final code has 1
>> insn less for the both tests when the patch is applied.
>>
>> I guess it is wrong to write quality tests based on expected code generated
>> before any optimization.  It has sense if we provide the same input.  LLVM
>> testsuite is mostly such tests as they have a readable IR.  GCC
>> unfortunately has no serialized and readable IR. On the other hand LLVM
>> lacks integrated testing.
>>
>> So I'd mark these tests as XFAIL or removed arm from DEJAGNU target in the
>> tests.
> FYI, those 2 tests also now FAIL on ppc64{,le}-linux in addition to
> armv7hl-linux-gnueabi.

So for the test gcc.dg/pr10474.c on arm with -marm -O3 before this patch we
perform shrink-wrapping:
     cmp    r0, #0
     bxeq    lr
     push    {r4, lr}
     mov    r4, r0
     ...

And after the patch we don't:
     push    {r4, lr}
     subs    r4, r0, #0
     popeq    {r4, pc}
     ...

The assembly after the "..." is identical.

So the resulting code is indeed shorter, though there is an
extra stack push and pop on the early return path.
A similar effect appears on gcc.dg/ira-shrinkwrap-prep-2.c.

I think both codegen decisions are valid though one
could argue that the new codegen is more appropriate for
-Os rather than -O3. If you agree then this is indeed a regression.
Though if so, it looks like a shrink-wrapping deficiency exposed by
this patch, rather than caused by it.

Jakub, do you happen to have the before and after codegen for these tests
on ppc64? I wonder if the effect is more clearcut there.

Thanks,
Kyrill


> 	Jakub
>
Segher Boessenkool April 5, 2016, 10:35 p.m. UTC | #6
On Tue, Apr 05, 2016 at 10:48:58AM +0100, Kyrill Tkachov wrote:
> So for the test gcc.dg/pr10474.c on arm with -marm -O3 before this patch we
> perform shrink-wrapping:
>     cmp    r0, #0
>     bxeq    lr
>     push    {r4, lr}
>     mov    r4, r0
>     ...
> 
> And after the patch we don't:
>     push    {r4, lr}
>     subs    r4, r0, #0
>     popeq    {r4, pc}
>     ...
> 
> The assembly after the "..." is identical.
> 
> So the resulting code is indeed shorter, though there is an
> extra stack push and pop on the early return path.
> A similar effect appears on gcc.dg/ira-shrinkwrap-prep-2.c.

The "new" code is better if there is no shrink-wrapping.  We can probably
teach prepare_shrink_wrap to do the extra register move if that will allow
us to wrap more.

> Though if so, it looks like a shrink-wrapping deficiency exposed by
> this patch, rather than caused by it.

Yes, and mostly a testcase problem even.

> Jakub, do you happen to have the before and after codegen for these tests
> on ppc64? I wonder if the effect is more clearcut there.

RTL before shrink-wrapping would be useful, too.


Segher
Kyrill Tkachov April 15, 2016, 11:06 a.m. UTC | #7
On 05/04/16 23:35, Segher Boessenkool wrote:
> On Tue, Apr 05, 2016 at 10:48:58AM +0100, Kyrill Tkachov wrote:
>> So for the test gcc.dg/pr10474.c on arm with -marm -O3 before this patch we
>> perform shrink-wrapping:
>>      cmp    r0, #0
>>      bxeq    lr
>>      push    {r4, lr}
>>      mov    r4, r0
>>      ...
>>
>> And after the patch we don't:
>>      push    {r4, lr}
>>      subs    r4, r0, #0
>>      popeq    {r4, pc}
>>      ...
>>
>> The assembly after the "..." is identical.
>>
>> So the resulting code is indeed shorter, though there is an
>> extra stack push and pop on the early return path.
>> A similar effect appears on gcc.dg/ira-shrinkwrap-prep-2.c.
> The "new" code is better if there is no shrink-wrapping.  We can probably
> teach prepare_shrink_wrap to do the extra register move if that will allow
> us to wrap more.
>
>> Though if so, it looks like a shrink-wrapping deficiency exposed by
>> this patch, rather than caused by it.
> Yes, and mostly a testcase problem even.
>
>> Jakub, do you happen to have the before and after codegen for these tests
>> on ppc64? I wonder if the effect is more clearcut there.
> RTL before shrink-wrapping would be useful, too.

So what shall we do for these tests for GCC 6?
Add an XFAIL for arm and powerpc?

Kyrill

>
> Segher
Jeff Law April 15, 2016, 4:18 p.m. UTC | #8
On 04/15/2016 05:06 AM, Kyrill Tkachov wrote:
>
> On 05/04/16 23:35, Segher Boessenkool wrote:
>> On Tue, Apr 05, 2016 at 10:48:58AM +0100, Kyrill Tkachov wrote:
>>> So for the test gcc.dg/pr10474.c on arm with -marm -O3 before this
>>> patch we
>>> perform shrink-wrapping:
>>>      cmp    r0, #0
>>>      bxeq    lr
>>>      push    {r4, lr}
>>>      mov    r4, r0
>>>      ...
>>>
>>> And after the patch we don't:
>>>      push    {r4, lr}
>>>      subs    r4, r0, #0
>>>      popeq    {r4, pc}
>>>      ...
>>>
>>> The assembly after the "..." is identical.
>>>
>>> So the resulting code is indeed shorter, though there is an
>>> extra stack push and pop on the early return path.
>>> A similar effect appears on gcc.dg/ira-shrinkwrap-prep-2.c.
>> The "new" code is better if there is no shrink-wrapping.  We can probably
>> teach prepare_shrink_wrap to do the extra register move if that will
>> allow
>> us to wrap more.
>>
>>> Though if so, it looks like a shrink-wrapping deficiency exposed by
>>> this patch, rather than caused by it.
>> Yes, and mostly a testcase problem even.
>>
>>> Jakub, do you happen to have the before and after codegen for these
>>> tests
>>> on ppc64? I wonder if the effect is more clearcut there.
>> RTL before shrink-wrapping would be useful, too.
>
> So what shall we do for these tests for GCC 6?
> Add an XFAIL for arm and powerpc?
We could just punt gcc-6 and focus on what we want for gcc-7 as this 
isn't a release critical issue.

jeff
Kyrill Tkachov April 15, 2016, 4:20 p.m. UTC | #9
On 15/04/16 17:18, Jeff Law wrote:
> On 04/15/2016 05:06 AM, Kyrill Tkachov wrote:
>>
>> On 05/04/16 23:35, Segher Boessenkool wrote:
>>> On Tue, Apr 05, 2016 at 10:48:58AM +0100, Kyrill Tkachov wrote:
>>>> So for the test gcc.dg/pr10474.c on arm with -marm -O3 before this
>>>> patch we
>>>> perform shrink-wrapping:
>>>>      cmp    r0, #0
>>>>      bxeq    lr
>>>>      push    {r4, lr}
>>>>      mov    r4, r0
>>>>      ...
>>>>
>>>> And after the patch we don't:
>>>>      push    {r4, lr}
>>>>      subs    r4, r0, #0
>>>>      popeq    {r4, pc}
>>>>      ...
>>>>
>>>> The assembly after the "..." is identical.
>>>>
>>>> So the resulting code is indeed shorter, though there is an
>>>> extra stack push and pop on the early return path.
>>>> A similar effect appears on gcc.dg/ira-shrinkwrap-prep-2.c.
>>> The "new" code is better if there is no shrink-wrapping.  We can probably
>>> teach prepare_shrink_wrap to do the extra register move if that will
>>> allow
>>> us to wrap more.
>>>
>>>> Though if so, it looks like a shrink-wrapping deficiency exposed by
>>>> this patch, rather than caused by it.
>>> Yes, and mostly a testcase problem even.
>>>
>>>> Jakub, do you happen to have the before and after codegen for these
>>>> tests
>>>> on ppc64? I wonder if the effect is more clearcut there.
>>> RTL before shrink-wrapping would be useful, too.
>>
>> So what shall we do for these tests for GCC 6?
>> Add an XFAIL for arm and powerpc?
> We could just punt gcc-6 and focus on what we want for gcc-7 as this isn't a release critical issue.
>

This was resolved with:
https://gcc.gnu.org/ml/gcc-patches/2016-04/msg00724.html

Sorry, I should have replied to this thread...

Kyrill

> jeff
Jeff Law April 15, 2016, 4:23 p.m. UTC | #10
On 04/15/2016 10:20 AM, Kyrill Tkachov wrote:

>
> This was resolved with:
> https://gcc.gnu.org/ml/gcc-patches/2016-04/msg00724.html
>
> Sorry, I should have replied to this thread...
No worries.  I probably should have checked the testcase before replying 
to the older email thread.

jeff
diff mbox

Patch

Index: ChangeLog
===================================================================
--- ChangeLog	(revision 234526)
+++ ChangeLog	(working copy)
@@ -1,3 +1,9 @@ 
+2016-03-29  Vladimir Makarov  <vmakarov@redhat.com>
+
+	PR rtl-optimization/68695
+	* ira-color.c (allocno_copy_cost_saving): New.
+	(improve_allocation): Use it.
+
 2016-03-29  Richard Henderson  <rth@redhat.com>
 
 	PR middle-end/70355
Index: ira-color.c
===================================================================
--- ira-color.c	(revision 234526)
+++ ira-color.c	(working copy)
@@ -2728,6 +2728,37 @@  allocno_cost_compare_func (const void *v
   return ALLOCNO_NUM (p1) - ALLOCNO_NUM (p2);
 }
 
+/* Return savings on removed copies when ALLOCNO is assigned to
+   HARD_REGNO.  */
+static int
+allocno_copy_cost_saving (ira_allocno_t allocno, int hard_regno)
+{
+  int cost = 0;
+  enum reg_class rclass;
+  ira_copy_t cp, next_cp;
+
+  rclass = REGNO_REG_CLASS (hard_regno);
+  for (cp = ALLOCNO_COPIES (allocno); cp != NULL; cp = next_cp)
+    {
+      if (cp->first == allocno)
+	{
+	  next_cp = cp->next_first_allocno_copy;
+	  if (ALLOCNO_HARD_REGNO (cp->second) != hard_regno)
+	    continue;
+	}
+      else if (cp->second == allocno)
+	{
+	  next_cp = cp->next_second_allocno_copy;
+	  if (ALLOCNO_HARD_REGNO (cp->first) != hard_regno)
+	    continue;
+	}
+      else
+	gcc_unreachable ();
+      cost += cp->freq * ira_register_move_cost[ALLOCNO_MODE (allocno)][rclass][rclass];
+    }
+  return cost;
+}
+
 /* We used Chaitin-Briggs coloring to assign as many pseudos as
    possible to hard registers.  Let us try to improve allocation with
    cost point of view.  This function improves the allocation by
@@ -2768,9 +2799,7 @@  improve_allocation (void)
 	continue;
       check++;
       aclass = ALLOCNO_CLASS (a);
-      allocno_costs = ALLOCNO_UPDATED_HARD_REG_COSTS (a);
-      if (allocno_costs == NULL)
-	allocno_costs = ALLOCNO_HARD_REG_COSTS (a);
+      allocno_costs = ALLOCNO_HARD_REG_COSTS (a);
       if ((hregno = ALLOCNO_HARD_REGNO (a)) < 0)
 	base_cost = ALLOCNO_UPDATED_MEMORY_COST (a);
       else if (allocno_costs == NULL)
@@ -2779,7 +2808,8 @@  improve_allocation (void)
 	   case).  */
 	continue;
       else
-	base_cost = allocno_costs[ira_class_hard_reg_index[aclass][hregno]];
+	base_cost = (allocno_costs[ira_class_hard_reg_index[aclass][hregno]]
+		     - allocno_copy_cost_saving (a, hregno));
       try_p = false;
       get_conflict_and_start_profitable_regs (a, false,
 					      conflicting_regs,
@@ -2797,6 +2827,7 @@  improve_allocation (void)
 	  k = allocno_costs == NULL ? 0 : j;
 	  costs[hregno] = (allocno_costs == NULL
 			   ? ALLOCNO_UPDATED_CLASS_COST (a) : allocno_costs[k]);
+	  costs[hregno] -= allocno_copy_cost_saving (a, hregno);
 	  costs[hregno] -= base_cost;
 	  if (costs[hregno] < 0)
 	    try_p = true;
@@ -2835,14 +2866,13 @@  improve_allocation (void)
 	      k = (ira_class_hard_reg_index
 		   [ALLOCNO_CLASS (conflict_a)][conflict_hregno]);
 	      ira_assert (k >= 0);
-	      if ((allocno_costs = ALLOCNO_UPDATED_HARD_REG_COSTS (conflict_a))
+	      if ((allocno_costs = ALLOCNO_HARD_REG_COSTS (conflict_a))
 		  != NULL)
 		spill_cost -= allocno_costs[k];
-	      else if ((allocno_costs = ALLOCNO_HARD_REG_COSTS (conflict_a))
-		       != NULL)
-		spill_cost -= allocno_costs[k];
 	      else
 		spill_cost -= ALLOCNO_UPDATED_CLASS_COST (conflict_a);
+	      spill_cost
+		+= allocno_copy_cost_saving (conflict_a, conflict_hregno);
 	      conflict_nregs
 		= hard_regno_nregs[conflict_hregno][ALLOCNO_MODE (conflict_a)];
 	      for (r = conflict_hregno;