diff mbox

PR78255: Make postreload aware of NO_FUNCTION_CSE

Message ID 584AB9AA.6030800@arm.com
State New
Headers show

Commit Message

Andre Vieira (lists) Dec. 9, 2016, 2:03 p.m. UTC
Hi,

This patch fixes the issue reported in PR78255 by making postreload
aware it should not be performing CSE on functions if NO_FUNCTION_CSE is
defined to true.

Bootstrap and full regression on arm-none-linux-gnueabihf and
aarch64-unknown-linux-gnu.

Also checked this fixed the reported issue on arm-none-eabi.

Is this OK for trunk?

Cheers,
Andre

gcc/ChangeLog
2016-12-09  Andre Vieira <andre.simoesdiasvieira@arm.com>

        PR rtl-optimization/78255
        * gcc/postreload.c (reload_cse_simplify): Do not CSE a function if
        NO_FUNCTION_CSE is true.

gcc/testsuite/ChangeLog:
2016-12-09  Andre Vieira <andre.simoesdiasvieira@arm.com>

        PR rtl-optimization/78255
        * gcc.target/arm/pr78255-1.c: New.
        * gcc.target/arm/pr78255-2.c: New.


gcc/testsuite/ChangeLog:
2016-12-09  Andre Vieira <andre.simoesdiasvieira@arm.com>

        PR rtl-optimization/78255
        * gcc.target/aarch64/pr78255.c: New.

Comments

Bernd Schmidt Dec. 9, 2016, 3:02 p.m. UTC | #1
On 12/09/2016 03:03 PM, Andre Vieira (lists) wrote:
> This patch fixes the issue reported in PR78255 by making postreload
> aware it should not be performing CSE on functions if NO_FUNCTION_CSE is
> defined to true.
>
> Bootstrap and full regression on arm-none-linux-gnueabihf and
> aarch64-unknown-linux-gnu.
>
> Also checked this fixed the reported issue on arm-none-eabi.
>
> Is this OK for trunk?

Hmm, it probably doesn't hurt, but looking at the PR I think the 
originally reported problem suggests you need a different fix: a 
separate register class to be used for indirect sibling calls. I 
remember seeing similar issues on other targets.


Bernd
Andre Vieira (lists) Dec. 9, 2016, 3:34 p.m. UTC | #2
On 09/12/16 15:02, Bernd Schmidt wrote:
> On 12/09/2016 03:03 PM, Andre Vieira (lists) wrote:
>> This patch fixes the issue reported in PR78255 by making postreload
>> aware it should not be performing CSE on functions if NO_FUNCTION_CSE is
>> defined to true.
>>
>> Bootstrap and full regression on arm-none-linux-gnueabihf and
>> aarch64-unknown-linux-gnu.
>>
>> Also checked this fixed the reported issue on arm-none-eabi.
>>
>> Is this OK for trunk?
> 
> Hmm, it probably doesn't hurt, but looking at the PR I think the
> originally reported problem suggests you need a different fix: a
> separate register class to be used for indirect sibling calls. I
> remember seeing similar issues on other targets.
> 
> 
> Bernd

I agree that even though this "fixes" the PR issue, this change is
fixing more than just that.

As for your suggestion to use a separate register class for indirect
sibling calls. We already do, we use CALLER_SAVE_REGS. However, 'r3' is
also allowed by that scheme as it should. Since if we don't use 'r3' to
either pass an argument or align the stack, then it is perfectly valid
to use it for indirect sibling calls.

The problem is at the time where we decide whether it is safe to use
'r3' we expect the assigned registers not to change and postreload does,
when it shouldn't. Hence why I am now telling it to not do that. Now it
could be that there are other cases in which the register allocation
would change after reload and before the pro and epilogue pass. Maybe we
shouldn't be making the decision quite so early. This is a bit of a can
of worms though...

Regardless, the other testcases I add in this patch show a sub-optimal
transformation done by postreload, turning direct calls into indirect
calls, for targets which have specifically pointed out that no CSE
should be done on functions through 'NO_FUNCTION_CSE'.  Maybe it would
make more sense to split this up into two PR's, though by fixing
postreload I wouldn't be able to reproduce the failure mentioned in PR78255.

Would you prefer I create a new PR for the problem this is actually
fixing and refile this PATCH under that PR?

Cheers,
Andre
Bernd Schmidt Dec. 9, 2016, 3:58 p.m. UTC | #3
On 12/09/2016 04:34 PM, Andre Vieira (lists) wrote:

> Regardless, the other testcases I add in this patch show a sub-optimal
> transformation done by postreload, turning direct calls into indirect
> calls, for targets which have specifically pointed out that no CSE
> should be done on functions through 'NO_FUNCTION_CSE'.

What I'm wondering about is whether the patch wouldn't also prevent the 
opposite transformation. Is there a reason not to do that one? Can the 
problem be modeled by tweaking costs?

> Would you prefer I create a new PR for the problem this is actually
> fixing and refile this PATCH under that PR?

Well, as long as you're working on fixing it I see no reason to clutter 
the bug database for the function cse issue, but do keep the existing PR 
open if there also ought to be register class changes.


Bernd
Jeff Law Dec. 9, 2016, 4:01 p.m. UTC | #4
On 12/09/2016 08:02 AM, Bernd Schmidt wrote:
> On 12/09/2016 03:03 PM, Andre Vieira (lists) wrote:
>> This patch fixes the issue reported in PR78255 by making postreload
>> aware it should not be performing CSE on functions if NO_FUNCTION_CSE is
>> defined to true.
>>
>> Bootstrap and full regression on arm-none-linux-gnueabihf and
>> aarch64-unknown-linux-gnu.
>>
>> Also checked this fixed the reported issue on arm-none-eabi.
>>
>> Is this OK for trunk?
>
> Hmm, it probably doesn't hurt, but looking at the PR I think the
> originally reported problem suggests you need a different fix: a
> separate register class to be used for indirect sibling calls. I
> remember seeing similar issues on other targets.
I think we actually split the call patterns into direct and indirect 
variants on the PA when we stumbled on this in cse.c.

Jeff
Ramana Radhakrishnan Dec. 9, 2016, 4:02 p.m. UTC | #5
On Fri, Dec 9, 2016 at 3:58 PM, Bernd Schmidt <bschmidt@redhat.com> wrote:
> On 12/09/2016 04:34 PM, Andre Vieira (lists) wrote:
>
>> Regardless, the other testcases I add in this patch show a sub-optimal
>> transformation done by postreload, turning direct calls into indirect
>> calls, for targets which have specifically pointed out that no CSE
>> should be done on functions through 'NO_FUNCTION_CSE'.
>
>
> What I'm wondering about is whether the patch wouldn't also prevent the
> opposite transformation. Is there a reason not to do that one? Can the
> problem be modeled by tweaking costs?

I really don't think we should have a solution that relies on costs
for correctness .

regards
Ramana
Andre Vieira (lists) Dec. 9, 2016, 4:16 p.m. UTC | #6
On 09/12/16 16:02, Ramana Radhakrishnan wrote:
> On Fri, Dec 9, 2016 at 3:58 PM, Bernd Schmidt <bschmidt@redhat.com> wrote:
>> On 12/09/2016 04:34 PM, Andre Vieira (lists) wrote:
>>
>>> Regardless, the other testcases I add in this patch show a sub-optimal
>>> transformation done by postreload, turning direct calls into indirect
>>> calls, for targets which have specifically pointed out that no CSE
>>> should be done on functions through 'NO_FUNCTION_CSE'.
>>
>>
>> What I'm wondering about is whether the patch wouldn't also prevent the
>> opposite transformation. Is there a reason not to do that one? Can the
>> problem be modeled by tweaking costs?
> 
> I really don't think we should have a solution that relies on costs
> for correctness .
> 
> regards
> Ramana
> 

Regardless, 'reload_cse_simplify' would never perform the opposite
transformation.  It checks whether it can replace anything within the
first argument INSN, with the second argument TESTREG. As the name
implies this will always be a register. I double checked, the function
is only called in 'reload_cse_regs' and 'testreg' is created using
'gen_rtx_REG'.

Cheers,
Andre
Bernd Schmidt Dec. 9, 2016, 4:31 p.m. UTC | #7
On 12/09/2016 05:16 PM, Andre Vieira (lists) wrote:

> Regardless, 'reload_cse_simplify' would never perform the opposite
> transformation.  It checks whether it can replace anything within the
> first argument INSN, with the second argument TESTREG. As the name
> implies this will always be a register. I double checked, the function
> is only called in 'reload_cse_regs' and 'testreg' is created using
> 'gen_rtx_REG'.

Ok, let's go ahead with it.


Bernd
Christophe Lyon Dec. 12, 2016, 9:04 a.m. UTC | #8
Hi Andre,

On 9 December 2016 at 17:16, Andre Vieira (lists)
<Andre.SimoesDiasVieira@arm.com> wrote:
> On 09/12/16 16:02, Ramana Radhakrishnan wrote:
>> On Fri, Dec 9, 2016 at 3:58 PM, Bernd Schmidt <bschmidt@redhat.com> wrote:
>>> On 12/09/2016 04:34 PM, Andre Vieira (lists) wrote:
>>>
>>>> Regardless, the other testcases I add in this patch show a sub-optimal
>>>> transformation done by postreload, turning direct calls into indirect
>>>> calls, for targets which have specifically pointed out that no CSE
>>>> should be done on functions through 'NO_FUNCTION_CSE'.
>>>
>>>
>>> What I'm wondering about is whether the patch wouldn't also prevent the
>>> opposite transformation. Is there a reason not to do that one? Can the
>>> problem be modeled by tweaking costs?
>>
>> I really don't think we should have a solution that relies on costs
>> for correctness .
>>
>> regards
>> Ramana
>>
>
> Regardless, 'reload_cse_simplify' would never perform the opposite
> transformation.  It checks whether it can replace anything within the
> first argument INSN, with the second argument TESTREG. As the name
> implies this will always be a register. I double checked, the function
> is only called in 'reload_cse_regs' and 'testreg' is created using
> 'gen_rtx_REG'.
>

The new test (gcc.target/arm/pr78255-2.c scan-assembler b\\s+bar)
added at r243494 fails on old arm architectures, such as:
* arm-none-linux-gnueabi, forcing -march=armv5t in runtestflags
* arm-none-eabi with default cpu/fpu/mode

Christophe


> Cheers,
> Andre
Andre Vieira (lists) Jan. 6, 2017, 10:53 a.m. UTC | #9
On 09/12/16 16:31, Bernd Schmidt wrote:
> On 12/09/2016 05:16 PM, Andre Vieira (lists) wrote:
> 
>> Regardless, 'reload_cse_simplify' would never perform the opposite
>> transformation.  It checks whether it can replace anything within the
>> first argument INSN, with the second argument TESTREG. As the name
>> implies this will always be a register. I double checked, the function
>> is only called in 'reload_cse_regs' and 'testreg' is created using
>> 'gen_rtx_REG'.
> 
> Ok, let's go ahead with it.
> 
> 
> Bernd
> 
Hello,

Is it OK to backport this (including the testcase fix) to gcc-6-branch?

Patches apply cleanly and full bootstrap and regression tests for
aarch64- and arm-none-linux-gnueabihf. Regression tested for arm-none-eabi.

Cheers,
Andre
Jeff Law Jan. 6, 2017, 3:47 p.m. UTC | #10
On 01/06/2017 03:53 AM, Andre Vieira (lists) wrote:
> On 09/12/16 16:31, Bernd Schmidt wrote:
>> On 12/09/2016 05:16 PM, Andre Vieira (lists) wrote:
>>
>>> Regardless, 'reload_cse_simplify' would never perform the opposite
>>> transformation.  It checks whether it can replace anything within the
>>> first argument INSN, with the second argument TESTREG. As the name
>>> implies this will always be a register. I double checked, the function
>>> is only called in 'reload_cse_regs' and 'testreg' is created using
>>> 'gen_rtx_REG'.
>>
>> Ok, let's go ahead with it.
>>
>>
>> Bernd
>>
> Hello,
>
> Is it OK to backport this (including the testcase fix) to gcc-6-branch?
>
> Patches apply cleanly and full bootstrap and regression tests for
> aarch64- and arm-none-linux-gnueabihf. Regression tested for arm-none-eabi.
Yes, that should be fine to backport to the active release branches.

jeff
Andre Vieira (lists) Jan. 11, 2017, 3:09 p.m. UTC | #11
On 06/01/17 15:47, Jeff Law wrote:
> On 01/06/2017 03:53 AM, Andre Vieira (lists) wrote:
>> On 09/12/16 16:31, Bernd Schmidt wrote:
>>> On 12/09/2016 05:16 PM, Andre Vieira (lists) wrote:
>>>
>>>> Regardless, 'reload_cse_simplify' would never perform the opposite
>>>> transformation.  It checks whether it can replace anything within the
>>>> first argument INSN, with the second argument TESTREG. As the name
>>>> implies this will always be a register. I double checked, the function
>>>> is only called in 'reload_cse_regs' and 'testreg' is created using
>>>> 'gen_rtx_REG'.
>>>
>>> Ok, let's go ahead with it.
>>>
>>>
>>> Bernd
>>>
>> Hello,
>>
>> Is it OK to backport this (including the testcase fix) to gcc-6-branch?
>>
>> Patches apply cleanly and full bootstrap and regression tests for
>> aarch64- and arm-none-linux-gnueabihf. Regression tested for
>> arm-none-eabi.
> Yes, that should be fine to backport to the active release branches.
> 
> jeff
OK, I have committed the backports to gcc-5 and gcc-6 branches.

Cheers,
Andre
diff mbox

Patch

diff --git a/gcc/postreload.c b/gcc/postreload.c
index 539ad33b6c3eb1b968677419a7420badc3a52f01..8325d121c403786fdb7804956724a81d134252a2 100644
--- a/gcc/postreload.c
+++ b/gcc/postreload.c
@@ -90,6 +90,11 @@  reload_cse_simplify (rtx_insn *insn, rtx testreg)
   basic_block insn_bb = BLOCK_FOR_INSN (insn);
   unsigned insn_bb_succs = EDGE_COUNT (insn_bb->succs);
 
+  /* If NO_FUNCTION_CSE has been set by the target, then we should not try
+     to cse function calls.  */
+  if (NO_FUNCTION_CSE && CALL_P (insn))
+    return false;
+
   if (GET_CODE (body) == SET)
     {
       int count = 0;
diff --git a/gcc/testsuite/gcc.target/aarch64/pr78255.c b/gcc/testsuite/gcc.target/aarch64/pr78255.c
new file mode 100644
index 0000000000000000000000000000000000000000..b078cf3e1c1c7717c9e227721a367f9846f0c7fe
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr78255.c
@@ -0,0 +1,12 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -mcmodel=tiny" } */
+
+extern int bar (void *);
+
+int
+foo (void)
+{
+  return bar ((void *)bar);
+}
+
+/* { dg-final { scan-assembler "b\\s+bar" } } */
diff --git a/gcc/testsuite/gcc.target/arm/pr78255-1.c b/gcc/testsuite/gcc.target/arm/pr78255-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..4901acea51466c0bac92d9cb90e52b00b450d88a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr78255-1.c
@@ -0,0 +1,57 @@ 
+/* { dg-do run } */
+/* { dg-options "-O2" }  */
+
+#include <string.h>
+
+struct table_s
+    {
+    void (*fun0)
+        ( void );
+    void (*fun1)
+        ( void );
+    void (*fun2)
+        ( void );
+    void (*fun3)
+        ( void );
+    void (*fun4)
+        ( void );
+    void (*fun5)
+        ( void );
+    void (*fun6)
+        ( void );
+    void (*fun7)
+        ( void );
+    } table;
+
+void callback0(){__asm("mov r0, r0 \n\t");}
+void callback1(){__asm("mov r0, r0 \n\t");}
+void callback2(){__asm("mov r0, r0 \n\t");}
+void callback3(){__asm("mov r0, r0 \n\t");}
+void callback4(){__asm("mov r0, r0 \n\t");}
+
+void test (void) {
+    memset(&table, 0, sizeof table);
+
+    asm volatile ("" : : : "r3");
+
+    table.fun0 = callback0;
+    table.fun1 = callback1;
+    table.fun2 = callback2;
+    table.fun3 = callback3;
+    table.fun4 = callback4;
+    table.fun0();
+}
+
+void foo (void)
+{
+  __builtin_abort ();
+}
+
+int main (void)
+{
+  unsigned long p = (unsigned long) &foo;
+  asm volatile ("mov r3, %0" : : "r" (p));
+  test ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/arm/pr78255-2.c b/gcc/testsuite/gcc.target/arm/pr78255-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..9e64ef3939465b088e35a01d4bb23fd50d43006d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr78255-2.c
@@ -0,0 +1,12 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2" }  */
+
+extern int bar (void *);
+
+int
+foo (void)
+{
+  return bar ((void*)bar);
+}
+
+/* { dg-final { scan-assembler "b\\s+bar" } } */