Message ID | 1580910807-24635-1-git-send-email-guojiufu@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | [rs6000] : mark clobber for registers changed by untpyed_call | expand |
On Wed, Feb 05, 2020 at 09:53:27PM +0800, Jiufu Guo wrote: > As PR93047 said, __builtin_apply/__builtin_return does not work well with > -frename-registers. This is caused by return register(e.g. r3) is used to > rename another register, before return register is stored to stack. > > This patch fix this issue by emitting clobber for those egisters which > maybe changed by untyped call. Yeah. untyped_call does /* The optimizer does not know that the call sets the function value registers we stored in the result block. We avoid problems by claiming that all hard registers are used and clobbered at this point. */ emit_insn (gen_blockage ()); but blockage does not say registers are used and clobbered: @cindex @code{blockage} instruction pattern @item @samp{blockage} This pattern defines a pseudo insn that prevents the instruction scheduler and other passes from moving instructions and using register equivalences across the boundary defined by the blockage insn. This needs to be an UNSPEC_VOLATILE pattern or a volatile ASM. Many archs have this same implementation of untyped_call (and of blockage, too). It all just works by luck (or it doesn't work). What we have is: emit_call_insn (gen_call (operands[0], const0_rtx, const0_rtx)); for (i = 0; i < XVECLEN (operands[2], 0); i++) { rtx set = XVECEXP (operands[2], 0, i); emit_move_insn (SET_DEST (set), SET_SRC (set)); } ... and nothing in the rtl stream says that those return registers are actually set by that call. Maybe we should use gen_call_value? Can we ever be asked to return more than a single thing here? Some trivial patch comments: > gcc/ > 2020-02-05 Jiufu Guo <guojiufu@linux.ibm.com> > > PR target/93047 > * config/rs6000/rs6000.md (untyped_call): add emit_clobber. "Add", capital. > gcc/testsuite > 2020-02-05 Jiufu Guo <guojiufu@linux.ibm.com> > > PR target/93047 > * gcc.dg/torture/stackalign/builtin-return-2.c: New case. "New test case." (And there is trailing whitespace here; Git warns about that, so this won't happen much in the future :-) ) Segher
Segher Boessenkool <segher@kernel.crashing.org> writes: > On Wed, Feb 05, 2020 at 09:53:27PM +0800, Jiufu Guo wrote: >> As PR93047 said, __builtin_apply/__builtin_return does not work well with >> -frename-registers. This is caused by return register(e.g. r3) is used to >> rename another register, before return register is stored to stack. >> >> This patch fix this issue by emitting clobber for those egisters which >> maybe changed by untyped call. > > Yeah. untyped_call does > > /* The optimizer does not know that the call sets the function value > registers we stored in the result block. We avoid problems by > claiming that all hard registers are used and clobbered at this > point. */ > emit_insn (gen_blockage ()); > > but blockage does not say registers are used and clobbered: > > @cindex @code{blockage} instruction pattern > @item @samp{blockage} > This pattern defines a pseudo insn that prevents the instruction > scheduler and other passes from moving instructions and using register > equivalences across the boundary defined by the blockage insn. > This needs to be an UNSPEC_VOLATILE pattern or a volatile ASM. > > Many archs have this same implementation of untyped_call (and of > blockage, too). It all just works by luck (or it doesn't work). > > What we have is: > > emit_call_insn (gen_call (operands[0], const0_rtx, const0_rtx)); > > for (i = 0; i < XVECLEN (operands[2], 0); i++) > { > rtx set = XVECEXP (operands[2], 0, i); > emit_move_insn (SET_DEST (set), SET_SRC (set)); > } > > ... and nothing in the rtl stream says that those return registers are > actually set by that call. Maybe we should use gen_call_value? Can we > ever be asked to return more than a single thing here? I was also thinking about using "gen_call_value" or "emit_clobber (r3)" which could generate rtl: "%3:DI=call [foo]" or "call [foo]; clobber r3". This could tell optimizer that %3 is changed. While there are potential issues that untyped_call may change other registers. So, mark clobber for all touched registers maybe more safe. > > Some trivial patch comments: > >> gcc/ >> 2020-02-05 Jiufu Guo <guojiufu@linux.ibm.com> >> >> PR target/93047 >> * config/rs6000/rs6000.md (untyped_call): add emit_clobber. > > "Add", capital. Thanks, > >> gcc/testsuite >> 2020-02-05 Jiufu Guo <guojiufu@linux.ibm.com> >> >> PR target/93047 >> * gcc.dg/torture/stackalign/builtin-return-2.c: New case. > > "New test case." (And there is trailing whitespace here; Git warns > about that, so this won't happen much in the future :-) ) Oh, get it, thanks. The withspace is after this line. Jiufu > > > Segher
Hi! On Thu, Feb 06, 2020 at 10:49:36AM +0800, Jiufu Guo wrote: > > emit_call_insn (gen_call (operands[0], const0_rtx, const0_rtx)); > > > > for (i = 0; i < XVECLEN (operands[2], 0); i++) > > { > > rtx set = XVECEXP (operands[2], 0, i); > > emit_move_insn (SET_DEST (set), SET_SRC (set)); > > } > > > > ... and nothing in the rtl stream says that those return registers are > > actually set by that call. Maybe we should use gen_call_value? Can we > > ever be asked to return more than a single thing here? > I was also thinking about using "gen_call_value" or "emit_clobber (r3)" > which could generate rtl: "%3:DI=call [foo]" or "call [foo]; clobber > r3". This could tell optimizer that %3 is changed. The problem with "call ; clobber r3" is that some set+use of a pseudo can be moved between these, and then rnreg can rename that to r3 again. We really need to show the call sets r3, in the general case (or that r3 is live after the call, at least). > While there are > potential issues that untyped_call may change other registers. So, mark > clobber for all touched registers maybe more safe. Well, we can derive what registers it sets, perhaps? What does x86 do here? It does something, I know that, haven't looked much deeper yet though :-) In general: this is not a problem for us only; some other archs may have found a good solution already. Segher
Segher Boessenkool <segher@kernel.crashing.org> writes: > Hi! > > On Thu, Feb 06, 2020 at 10:49:36AM +0800, Jiufu Guo wrote: >> > emit_call_insn (gen_call (operands[0], const0_rtx, const0_rtx)); >> > >> > for (i = 0; i < XVECLEN (operands[2], 0); i++) >> > { >> > rtx set = XVECEXP (operands[2], 0, i); >> > emit_move_insn (SET_DEST (set), SET_SRC (set)); >> > } >> > >> > ... and nothing in the rtl stream says that those return registers are >> > actually set by that call. Maybe we should use gen_call_value? Can we >> > ever be asked to return more than a single thing here? >> I was also thinking about using "gen_call_value" or "emit_clobber (r3)" >> which could generate rtl: "%3:DI=call [foo]" or "call [foo]; clobber >> r3". This could tell optimizer that %3 is changed. > > The problem with "call ; clobber r3" is that some set+use of a pseudo can > be moved between these, and then rnreg can rename that to r3 again. We > really need to show the call sets r3, in the general case (or that r3 is > live after the call, at least). Thanks! More careful thought You are right: set+use maybe able to move between "call ; clobber". "%3=call" is ok without this issue. > >> While there are >> potential issues that untyped_call may change other registers. So, mark >> clobber for all touched registers maybe more safe. > > Well, we can derive what registers it sets, perhaps? What does x86 do > here? It does something, I know that, haven't looked much deeper yet > though :-) For x86, it is generates something like "%c=call": Ix86_Expand_call ((TARGET_FLOAT_RETURNS_IN_80387 ? gen_rtx_REG (XCmode, FIRST_FLOAT_REG) : NULL), operands[0], const0_rtx,... first argument of ix86_expand_call is 'set of call'. As comment of untyped_call of i386.md: /* In order to give reg-stack an easier job in validating two coprocessor registers as containing a possible return value, simply pretend the untyped call returns a complex long double value. For ppc, maybe %3:DI,%3:TI, %1SF... maybe set by untyped_call, right? And from trunk source code(builtins.c and .md for targets): for (i = 0; i < XVECLEN (operands[2], 0); i++) { rtx set = XVECEXP (operands[2], 0, i); emit_move_insn (SET_DEST (set), SET_SRC (set)); } Above code may means all registers in operands[2] are stored/moved to stack, those registers maybe altered. Any corrections? Thanks for your comments and sugguestions! Jiufu > > In general: this is not a problem for us only; some other archs may have > found a good solution already. > > > Segher
Hi! On Fri, Feb 07, 2020 at 03:10:05PM +0800, Jiufu Guo wrote: > Segher Boessenkool <segher@kernel.crashing.org> writes: > > On Thu, Feb 06, 2020 at 10:49:36AM +0800, Jiufu Guo wrote: > >> > ... and nothing in the rtl stream says that those return registers are > >> > actually set by that call. Maybe we should use gen_call_value? Can we > >> > ever be asked to return more than a single thing here? > >> I was also thinking about using "gen_call_value" or "emit_clobber (r3)" > >> which could generate rtl: "%3:DI=call [foo]" or "call [foo]; clobber > >> r3". This could tell optimizer that %3 is changed. > > > > The problem with "call ; clobber r3" is that some set+use of a pseudo can > > be moved between these, and then rnreg can rename that to r3 again. We > > really need to show the call sets r3, in the general case (or that r3 is > > live after the call, at least). > Thanks! More careful thought You are right: set+use maybe able to move between "call ; > clobber". "%3=call" is ok without this issue. Yeah. And we do not know which of the register will be used for the return, in untyped_call (only untyped-return knows). But we can add clobbers of all registers that *might* be used for the return, we do know that here, see operands[2] of untyped_call. This really should be done in generic code though? Segher
On Sat, Feb 08, 2020 at 10:17:42AM -0600, Segher Boessenkool wrote: > And we do not know which of the register will be used for the return, in > untyped_call (only untyped-return knows). But we can add clobbers of all > registers that *might* be used for the return, we do know that here, see > operands[2] of untyped_call. Clobbers in parallel to the call, I mean, not as separate insns later. Segher
Segher Boessenkool <segher@kernel.crashing.org> writes: > On Sat, Feb 08, 2020 at 10:17:42AM -0600, Segher Boessenkool wrote: >> And we do not know which of the register will be used for the return, in >> untyped_call (only untyped-return knows). But we can add clobbers of all >> registers that *might* be used for the return, we do know that here, see >> operands[2] of untyped_call. > > Clobbers in parallel to the call, I mean, not as separate insns later. > Thanks Segher, as discussed, we may refine the patch by adding a 'barrier' to avoid instructions mis-scheduled. This improvement patch maybe fine for practice. Below is the updated patch: bootstrap and regtest pass on powerpc64le. Is this ok to submit to trunk? diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index f3c8eb0..2fbe68f 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -10867,6 +10867,10 @@ emit_call_insn (gen_call (operands[0], const0_rtx, const0_rtx)); + for (int i = 0; i < XVECLEN (operands[2], 0); i++) + emit_clobber (SET_SRC (XVECEXP (operands[2], 0, i))); + emit_insn (gen_blockage ()); + for (i = 0; i < XVECLEN (operands[2], 0); i++) { rtx set = XVECEXP (operands[2], 0, i); diff --git a/gcc/testsuite/gcc.dg/torture/stackalign/builtin-return-2.c b/gcc/testsuite/gcc.dg/torture/stackalign/builtin-return-2.c new file mode 100644 index 0000000..7719109 --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/stackalign/builtin-return-2.c @@ -0,0 +1,40 @@ +/* PR target/93047 */ +/* Originator: Andrew Church <gcczilla@achurch.org> */ +/* { dg-do run } */ +/* { dg-additional-options "-O3 -frename-registers" } */ +/* { dg-require-effective-target untyped_assembly } */ + +#ifdef __MMIX__ +/* No parameters on stack for bar. */ +#define STACK_ARGUMENTS_SIZE 0 +#else +#define STACK_ARGUMENTS_SIZE 64 +#endif + +extern void abort(void); + +int foo(int n) +{ + return n+1; +} + +int bar(int n) +{ + __builtin_return(__builtin_apply((void (*)(void))foo, __builtin_apply_args(), + STACK_ARGUMENTS_SIZE)); +} + +int main(void) +{ + /* Allocate 64 bytes on the stack to make sure that __builtin_apply + can read at least 64 bytes above the return address. */ + char dummy[64]; + + __asm__ ("" : : "" (dummy)); + + if (bar(1) != 2) + abort(); + + return 0; +} + > > Segher
On Fri, Feb 14, 2020 at 02:58:49PM +0800, Jiufu Guo wrote: > Segher Boessenkool <segher@kernel.crashing.org> writes: > > > On Sat, Feb 08, 2020 at 10:17:42AM -0600, Segher Boessenkool wrote: > >> And we do not know which of the register will be used for the return, in > >> untyped_call (only untyped-return knows). But we can add clobbers of all > >> registers that *might* be used for the return, we do know that here, see > >> operands[2] of untyped_call. > > > > Clobbers in parallel to the call, I mean, not as separate insns later. > > > Thanks Segher, as discussed, we may refine the patch by adding a > 'barrier' to avoid instructions mis-scheduled. This improvement patch > maybe fine for practice. > > Below is the updated patch: bootstrap and regtest pass on powerpc64le. > Is this ok to submit to trunk? This is okay for trunk, and for backports after a week or so. Thanks! (It doesn't fully solve the problem, as we discussed; because the return registers are not seen (by df, etc.) as live immediately after the call, new instructions using those registers can be created between the call and the later clobber. The barrier does nothing to prevent that.) Don't forget to do a changelog entry, mentioning the PR as well :-) Segher
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index f3c8eb0..d68bc24 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -10867,6 +10867,9 @@ emit_call_insn (gen_call (operands[0], const0_rtx, const0_rtx)); + for (int i = 0; i < XVECLEN (operands[2], 0); i++) + emit_clobber (SET_SRC (XVECEXP (operands[2], 0, i))); + for (i = 0; i < XVECLEN (operands[2], 0); i++) { rtx set = XVECEXP (operands[2], 0, i); diff --git a/gcc/testsuite/gcc.dg/torture/stackalign/builtin-return-2.c b/gcc/testsuite/gcc.dg/torture/stackalign/builtin-return-2.c new file mode 100644 index 0000000..7719109 --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/stackalign/builtin-return-2.c @@ -0,0 +1,40 @@ +/* PR target/93047 */ +/* Originator: Andrew Church <gcczilla@achurch.org> */ +/* { dg-do run } */ +/* { dg-additional-options "-O3 -frename-registers" } */ +/* { dg-require-effective-target untyped_assembly } */ + +#ifdef __MMIX__ +/* No parameters on stack for bar. */ +#define STACK_ARGUMENTS_SIZE 0 +#else +#define STACK_ARGUMENTS_SIZE 64 +#endif + +extern void abort(void); + +int foo(int n) +{ + return n+1; +} + +int bar(int n) +{ + __builtin_return(__builtin_apply((void (*)(void))foo, __builtin_apply_args(), + STACK_ARGUMENTS_SIZE)); +} + +int main(void) +{ + /* Allocate 64 bytes on the stack to make sure that __builtin_apply + can read at least 64 bytes above the return address. */ + char dummy[64]; + + __asm__ ("" : : "" (dummy)); + + if (bar(1) != 2) + abort(); + + return 0; +} +