diff mbox series

[rs6000] : mark clobber for registers changed by untpyed_call

Message ID 1580910807-24635-1-git-send-email-guojiufu@linux.ibm.com
State New
Headers show
Series [rs6000] : mark clobber for registers changed by untpyed_call | expand

Commit Message

Jiufu Guo Feb. 5, 2020, 1:53 p.m. UTC
As PR93047 said, __builtin_apply/__builtin_return does not work well with
-frename-registers.  This is caused by return register(e.g. r3) is used to
rename another register, before return register is stored to stack.

This patch fix this issue by emitting clobber for those egisters which
maybe changed by untyped call.

Because this would be a kind of wrong code, so I'm thinking to submit to
gcc10.

Bootstrap and regtest on powerpc64le are pass.  Is this okay for trunk?

Thanks,
Jiufu

gcc/
2020-02-05  Jiufu Guo  <guojiufu@linux.ibm.com>

	PR target/93047
	* config/rs6000/rs6000.md (untyped_call): add emit_clobber.

gcc/testsuite
2020-02-05  Jiufu Guo  <guojiufu@linux.ibm.com>

	PR target/93047
	* gcc.dg/torture/stackalign/builtin-return-2.c: New case.

Comments

Segher Boessenkool Feb. 5, 2020, 7:25 p.m. UTC | #1
On Wed, Feb 05, 2020 at 09:53:27PM +0800, Jiufu Guo wrote:
> As PR93047 said, __builtin_apply/__builtin_return does not work well with
> -frename-registers.  This is caused by return register(e.g. r3) is used to
> rename another register, before return register is stored to stack.
> 
> This patch fix this issue by emitting clobber for those egisters which
> maybe changed by untyped call.

Yeah.  untyped_call does

  /* The optimizer does not know that the call sets the function value
     registers we stored in the result block.  We avoid problems by
     claiming that all hard registers are used and clobbered at this
     point.  */
  emit_insn (gen_blockage ());

but blockage does not say registers are used and clobbered:

@cindex @code{blockage} instruction pattern
@item @samp{blockage}
This pattern defines a pseudo insn that prevents the instruction
scheduler and other passes from moving instructions and using register
equivalences across the boundary defined by the blockage insn.
This needs to be an UNSPEC_VOLATILE pattern or a volatile ASM.

Many archs have this same implementation of untyped_call (and of
blockage, too).  It all just works by luck (or it doesn't work).

What we have is:

  emit_call_insn (gen_call (operands[0], const0_rtx, const0_rtx));

  for (i = 0; i < XVECLEN (operands[2], 0); i++)
    {
      rtx set = XVECEXP (operands[2], 0, i);
      emit_move_insn (SET_DEST (set), SET_SRC (set));
    }

... and nothing in the rtl stream says that those return registers are
actually set by that call.  Maybe we should use gen_call_value?  Can we
ever be asked to return more than a single thing here?

Some trivial patch comments:

> gcc/
> 2020-02-05  Jiufu Guo  <guojiufu@linux.ibm.com>
> 
> 	PR target/93047
> 	* config/rs6000/rs6000.md (untyped_call): add emit_clobber.

"Add", capital.

> gcc/testsuite
> 2020-02-05  Jiufu Guo  <guojiufu@linux.ibm.com>
> 
> 	PR target/93047
> 	* gcc.dg/torture/stackalign/builtin-return-2.c: New case.

"New test case."  (And there is trailing whitespace here; Git warns
about that, so this won't happen much in the future :-) )


Segher
Jiufu Guo Feb. 6, 2020, 2:49 a.m. UTC | #2
Segher Boessenkool <segher@kernel.crashing.org> writes:

> On Wed, Feb 05, 2020 at 09:53:27PM +0800, Jiufu Guo wrote:
>> As PR93047 said, __builtin_apply/__builtin_return does not work well with
>> -frename-registers.  This is caused by return register(e.g. r3) is used to
>> rename another register, before return register is stored to stack.
>> 
>> This patch fix this issue by emitting clobber for those egisters which
>> maybe changed by untyped call.
>
> Yeah.  untyped_call does
>
>   /* The optimizer does not know that the call sets the function value
>      registers we stored in the result block.  We avoid problems by
>      claiming that all hard registers are used and clobbered at this
>      point.  */
>   emit_insn (gen_blockage ());
>
> but blockage does not say registers are used and clobbered:
>
> @cindex @code{blockage} instruction pattern
> @item @samp{blockage}
> This pattern defines a pseudo insn that prevents the instruction
> scheduler and other passes from moving instructions and using register
> equivalences across the boundary defined by the blockage insn.
> This needs to be an UNSPEC_VOLATILE pattern or a volatile ASM.
>
> Many archs have this same implementation of untyped_call (and of
> blockage, too).  It all just works by luck (or it doesn't work).
>
> What we have is:
>
>   emit_call_insn (gen_call (operands[0], const0_rtx, const0_rtx));
>
>   for (i = 0; i < XVECLEN (operands[2], 0); i++)
>     {
>       rtx set = XVECEXP (operands[2], 0, i);
>       emit_move_insn (SET_DEST (set), SET_SRC (set));
>     }
>
> ... and nothing in the rtl stream says that those return registers are
> actually set by that call.  Maybe we should use gen_call_value?  Can we
> ever be asked to return more than a single thing here?
I was also thinking about using "gen_call_value" or "emit_clobber (r3)"
which could generate rtl: "%3:DI=call [foo]" or "call [foo]; clobber
r3".  This could tell optimizer that %3 is changed.  While there are
potential issues that untyped_call may change other registers.  So, mark
clobber for all touched registers maybe more safe.

>
> Some trivial patch comments:
>
>> gcc/
>> 2020-02-05  Jiufu Guo  <guojiufu@linux.ibm.com>
>> 
>> 	PR target/93047
>> 	* config/rs6000/rs6000.md (untyped_call): add emit_clobber.
>
> "Add", capital.
Thanks,
>
>> gcc/testsuite
>> 2020-02-05  Jiufu Guo  <guojiufu@linux.ibm.com>
>> 
>> 	PR target/93047
>> 	* gcc.dg/torture/stackalign/builtin-return-2.c: New case.
>
> "New test case."  (And there is trailing whitespace here; Git warns
> about that, so this won't happen much in the future :-) )
Oh, get it, thanks. The withspace is after this line.

Jiufu
>
>
> Segher
Segher Boessenkool Feb. 6, 2020, 3:35 p.m. UTC | #3
Hi!

On Thu, Feb 06, 2020 at 10:49:36AM +0800, Jiufu Guo wrote:
> >   emit_call_insn (gen_call (operands[0], const0_rtx, const0_rtx));
> >
> >   for (i = 0; i < XVECLEN (operands[2], 0); i++)
> >     {
> >       rtx set = XVECEXP (operands[2], 0, i);
> >       emit_move_insn (SET_DEST (set), SET_SRC (set));
> >     }
> >
> > ... and nothing in the rtl stream says that those return registers are
> > actually set by that call.  Maybe we should use gen_call_value?  Can we
> > ever be asked to return more than a single thing here?
> I was also thinking about using "gen_call_value" or "emit_clobber (r3)"
> which could generate rtl: "%3:DI=call [foo]" or "call [foo]; clobber
> r3".  This could tell optimizer that %3 is changed.

The problem with "call ; clobber r3" is that some set+use of a pseudo can
be moved between these, and then rnreg can rename that to r3 again.  We
really need to show the call sets r3, in the general case (or that r3 is
live after the call, at least).

> While there are
> potential issues that untyped_call may change other registers.  So, mark
> clobber for all touched registers maybe more safe.

Well, we can derive what registers it sets, perhaps?  What does x86 do
here?  It does something, I know that, haven't looked much deeper yet
though :-)

In general: this is not a problem for us only; some other archs may have
found a good solution already.


Segher
Jiufu Guo Feb. 7, 2020, 7:10 a.m. UTC | #4
Segher Boessenkool <segher@kernel.crashing.org> writes:

> Hi!
>
> On Thu, Feb 06, 2020 at 10:49:36AM +0800, Jiufu Guo wrote:
>> >   emit_call_insn (gen_call (operands[0], const0_rtx, const0_rtx));
>> >
>> >   for (i = 0; i < XVECLEN (operands[2], 0); i++)
>> >     {
>> >       rtx set = XVECEXP (operands[2], 0, i);
>> >       emit_move_insn (SET_DEST (set), SET_SRC (set));
>> >     }
>> >
>> > ... and nothing in the rtl stream says that those return registers are
>> > actually set by that call.  Maybe we should use gen_call_value?  Can we
>> > ever be asked to return more than a single thing here?
>> I was also thinking about using "gen_call_value" or "emit_clobber (r3)"
>> which could generate rtl: "%3:DI=call [foo]" or "call [foo]; clobber
>> r3".  This could tell optimizer that %3 is changed.
>
> The problem with "call ; clobber r3" is that some set+use of a pseudo can
> be moved between these, and then rnreg can rename that to r3 again.  We
> really need to show the call sets r3, in the general case (or that r3 is
> live after the call, at least).
Thanks! More careful thought You are right: set+use maybe able to move between "call ;
clobber". "%3=call" is ok without this issue.

>
>> While there are
>> potential issues that untyped_call may change other registers.  So, mark
>> clobber for all touched registers maybe more safe.
>
> Well, we can derive what registers it sets, perhaps?  What does x86 do
> here?  It does something, I know that, haven't looked much deeper yet
> though :-)
For x86, it is generates something like "%c=call":
  Ix86_Expand_call ((TARGET_FLOAT_RETURNS_IN_80387
		     ? gen_rtx_REG (XCmode, FIRST_FLOAT_REG) : NULL),
		    operands[0], const0_rtx,...
first argument of ix86_expand_call is 'set of call'. As comment of
untyped_call of i386.md:
  /* In order to give reg-stack an easier job in validating two
     coprocessor registers as containing a possible return value,
     simply pretend the untyped call returns a complex long double
     value. 

For ppc, maybe %3:DI,%3:TI, %1SF... maybe set by untyped_call, right?
And from trunk source code(builtins.c and .md for targets):
  for (i = 0; i < XVECLEN (operands[2], 0); i++)
    {
      rtx set = XVECEXP (operands[2], 0, i);
      emit_move_insn (SET_DEST (set), SET_SRC (set));
    }

Above code may means all registers in operands[2] are stored/moved to
stack, those registers maybe altered. Any corrections?

Thanks for your comments and sugguestions!
Jiufu

>
> In general: this is not a problem for us only; some other archs may have
> found a good solution already.
>
>
> Segher
Segher Boessenkool Feb. 8, 2020, 4:17 p.m. UTC | #5
Hi!

On Fri, Feb 07, 2020 at 03:10:05PM +0800, Jiufu Guo wrote:
> Segher Boessenkool <segher@kernel.crashing.org> writes:
> > On Thu, Feb 06, 2020 at 10:49:36AM +0800, Jiufu Guo wrote:
> >> > ... and nothing in the rtl stream says that those return registers are
> >> > actually set by that call.  Maybe we should use gen_call_value?  Can we
> >> > ever be asked to return more than a single thing here?
> >> I was also thinking about using "gen_call_value" or "emit_clobber (r3)"
> >> which could generate rtl: "%3:DI=call [foo]" or "call [foo]; clobber
> >> r3".  This could tell optimizer that %3 is changed.
> >
> > The problem with "call ; clobber r3" is that some set+use of a pseudo can
> > be moved between these, and then rnreg can rename that to r3 again.  We
> > really need to show the call sets r3, in the general case (or that r3 is
> > live after the call, at least).
> Thanks! More careful thought You are right: set+use maybe able to move between "call ;
> clobber". "%3=call" is ok without this issue.

Yeah.

And we do not know which of the register will be used for the return, in
untyped_call (only untyped-return knows).  But we can add clobbers of all
registers that *might* be used for the return, we do know that here, see
operands[2] of untyped_call.

This really should be done in generic code though?


Segher
Segher Boessenkool Feb. 8, 2020, 9:45 p.m. UTC | #6
On Sat, Feb 08, 2020 at 10:17:42AM -0600, Segher Boessenkool wrote:
> And we do not know which of the register will be used for the return, in
> untyped_call (only untyped-return knows).  But we can add clobbers of all
> registers that *might* be used for the return, we do know that here, see
> operands[2] of untyped_call.

Clobbers in parallel to the call, I mean, not as separate insns later.


Segher
Jiufu Guo Feb. 14, 2020, 6:58 a.m. UTC | #7
Segher Boessenkool <segher@kernel.crashing.org> writes:

> On Sat, Feb 08, 2020 at 10:17:42AM -0600, Segher Boessenkool wrote:
>> And we do not know which of the register will be used for the return, in
>> untyped_call (only untyped-return knows).  But we can add clobbers of all
>> registers that *might* be used for the return, we do know that here, see
>> operands[2] of untyped_call.
>
> Clobbers in parallel to the call, I mean, not as separate insns later.
>
Thanks Segher, as discussed, we may refine the patch by adding a
'barrier' to avoid instructions mis-scheduled.  This improvement patch
maybe fine for practice.

Below is the updated patch: bootstrap and regtest pass on powerpc64le.
Is this ok to submit to trunk?
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index f3c8eb0..2fbe68f 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -10867,6 +10867,10 @@
 
   emit_call_insn (gen_call (operands[0], const0_rtx, const0_rtx));
 
+  for (int i = 0; i < XVECLEN (operands[2], 0); i++)
+    emit_clobber (SET_SRC (XVECEXP (operands[2], 0, i)));
+  emit_insn (gen_blockage ());
+
   for (i = 0; i < XVECLEN (operands[2], 0); i++)
     {
       rtx set = XVECEXP (operands[2], 0, i);
diff --git a/gcc/testsuite/gcc.dg/torture/stackalign/builtin-return-2.c b/gcc/testsuite/gcc.dg/torture/stackalign/builtin-return-2.c
new file mode 100644
index 0000000..7719109
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/stackalign/builtin-return-2.c
@@ -0,0 +1,40 @@
+/* PR target/93047 */
+/* Originator: Andrew Church <gcczilla@achurch.org> */
+/* { dg-do run } */
+/* { dg-additional-options "-O3 -frename-registers" } */
+/* { dg-require-effective-target untyped_assembly } */
+
+#ifdef __MMIX__
+/* No parameters on stack for bar.  */
+#define STACK_ARGUMENTS_SIZE 0
+#else
+#define STACK_ARGUMENTS_SIZE 64
+#endif
+
+extern void abort(void);
+
+int foo(int n)
+{
+  return n+1;
+}
+
+int bar(int n)
+{
+  __builtin_return(__builtin_apply((void (*)(void))foo, __builtin_apply_args(),
+				   STACK_ARGUMENTS_SIZE));
+}
+
+int main(void)
+{
+  /* Allocate 64 bytes on the stack to make sure that __builtin_apply
+     can read at least 64 bytes above the return address.  */
+  char dummy[64];
+
+  __asm__ ("" : : "" (dummy));
+
+  if (bar(1) != 2)
+    abort();
+
+  return 0;
+}
+
>
> Segher
Segher Boessenkool Feb. 14, 2020, 6:12 p.m. UTC | #8
On Fri, Feb 14, 2020 at 02:58:49PM +0800, Jiufu Guo wrote:
> Segher Boessenkool <segher@kernel.crashing.org> writes:
> 
> > On Sat, Feb 08, 2020 at 10:17:42AM -0600, Segher Boessenkool wrote:
> >> And we do not know which of the register will be used for the return, in
> >> untyped_call (only untyped-return knows).  But we can add clobbers of all
> >> registers that *might* be used for the return, we do know that here, see
> >> operands[2] of untyped_call.
> >
> > Clobbers in parallel to the call, I mean, not as separate insns later.
> >
> Thanks Segher, as discussed, we may refine the patch by adding a
> 'barrier' to avoid instructions mis-scheduled.  This improvement patch
> maybe fine for practice.
> 
> Below is the updated patch: bootstrap and regtest pass on powerpc64le.
> Is this ok to submit to trunk?

This is okay for trunk, and for backports after a week or so.  Thanks!

(It doesn't fully solve the problem, as we discussed; because the return
registers are not seen (by df, etc.) as live immediately after the call,
new instructions using those registers can be created between the call
and the later clobber.  The barrier does nothing to prevent that.)

Don't forget to do a changelog entry, mentioning the PR as well :-)


Segher
diff mbox series

Patch

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index f3c8eb0..d68bc24 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -10867,6 +10867,9 @@ 
 
   emit_call_insn (gen_call (operands[0], const0_rtx, const0_rtx));
 
+  for (int i = 0; i < XVECLEN (operands[2], 0); i++)
+    emit_clobber (SET_SRC (XVECEXP (operands[2], 0, i)));
+
   for (i = 0; i < XVECLEN (operands[2], 0); i++)
     {
       rtx set = XVECEXP (operands[2], 0, i);
diff --git a/gcc/testsuite/gcc.dg/torture/stackalign/builtin-return-2.c b/gcc/testsuite/gcc.dg/torture/stackalign/builtin-return-2.c
new file mode 100644
index 0000000..7719109
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/stackalign/builtin-return-2.c
@@ -0,0 +1,40 @@ 
+/* PR target/93047 */
+/* Originator: Andrew Church <gcczilla@achurch.org> */
+/* { dg-do run } */
+/* { dg-additional-options "-O3 -frename-registers" } */
+/* { dg-require-effective-target untyped_assembly } */
+
+#ifdef __MMIX__
+/* No parameters on stack for bar.  */
+#define STACK_ARGUMENTS_SIZE 0
+#else
+#define STACK_ARGUMENTS_SIZE 64
+#endif
+
+extern void abort(void);
+
+int foo(int n)
+{
+  return n+1;
+}
+
+int bar(int n)
+{
+  __builtin_return(__builtin_apply((void (*)(void))foo, __builtin_apply_args(),
+				   STACK_ARGUMENTS_SIZE));
+}
+
+int main(void)
+{
+  /* Allocate 64 bytes on the stack to make sure that __builtin_apply
+     can read at least 64 bytes above the return address.  */
+  char dummy[64];
+
+  __asm__ ("" : : "" (dummy));
+
+  if (bar(1) != 2)
+    abort();
+
+  return 0;
+}
+