diff mbox

[tree-tailcall] Check if function returns it's argument

Message ID CAAgBjMmjBidSVB02pv8oW5ZJhxP-_yAK9nC6e8KxkJz-EgxK7Q@mail.gmail.com
State New
Headers show

Commit Message

Prathamesh Kulkarni Dec. 1, 2016, 1:05 p.m. UTC
On 1 December 2016 at 18:26, Richard Biener <rguenther@suse.de> wrote:
> On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:
>
>> On 1 December 2016 at 17:40, Richard Biener <rguenther@suse.de> wrote:
>> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:
>> >
>> >> On 25 November 2016 at 21:17, Jeff Law <law@redhat.com> wrote:
>> >> > On 11/25/2016 01:07 AM, Richard Biener wrote:
>> >> >
>> >> >>> For the tail-call, issue should we artificially create a lhs and use that
>> >> >>> as return value (perhaps by a separate pass before tailcall) ?
>> >> >>>
>> >> >>> __builtin_memcpy (a1, a2, a3);
>> >> >>> return a1;
>> >> >>>
>> >> >>> gets transformed to:
>> >> >>> _1 = __builtin_memcpy (a1, a2, a3)
>> >> >>> return _1;
>> >> >>>
>> >> >>> So tail-call optimization pass would see the IL in it's expected form.
>> >> >>
>> >> >>
>> >> >> As said, a RTL expert needs to chime in here.  Iff then tail-call
>> >> >> itself should do this rewrite.  But if this form is required to make
>> >> >> things work (I suppose you checked it _does_ actually work?) then
>> >> >> we'd need to make sure later passes do not undo it.  So it looks
>> >> >> fragile to me.  OTOH I seem to remember that the flags we set on
>> >> >> GIMPLE are merely a hint to RTL expansion and the tailcalling is
>> >> >> verified again there?
>> >> >
>> >> > So tail calling actually sits on the border between trees and RTL.
>> >> > Essentially it's an expand-time decision as we use information from trees as
>> >> > well as low level target information.
>> >> >
>> >> > I would not expect the former sequence to tail call.  The tail calling code
>> >> > does not know that the return value from memcpy will be a1.  Thus the tail
>> >> > calling code has to assume that it'll have to copy a1 into the return
>> >> > register after returning from memcpy, which obviously can't be done if we
>> >> > tail called memcpy.
>> >> >
>> >> > The second form is much more likely to turn into a tail call sequence
>> >> > because the return value from memcpy will be sitting in the proper register.
>> >> > This form out to work for most calling conventions that allow tail calls.
>> >> >
>> >> > We could (in theory) try and exploit the fact that memcpy returns its first
>> >> > argument as a return value, but that would only be helpful on a target where
>> >> > the first argument and return value use the same register. So I'd have a
>> >> > slight preference to rewriting per Prathamesh's suggestion above since it's
>> >> > more general.
>> >> Thanks for the suggestion. The attached patch creates artificial lhs,
>> >> and returns it if the function returns it's argument and that argument
>> >> is used as return-value.
>> >>
>> >> eg:
>> >> f (void * a1, void * a2, long unsigned int a3)
>> >> {
>> >>   <bb 2> [0.0%]:
>> >>   # .MEM_5 = VDEF <.MEM_1(D)>
>> >>   __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
>> >>   # VUSE <.MEM_5>
>> >>   return a1_2(D);
>> >>
>> >> }
>> >>
>> >> is transformed to:
>> >> f (void * a1, void * a2, long unsigned int a3)
>> >> {
>> >>   void * _6;
>> >>
>> >>   <bb 2> [0.0%]:
>> >>   # .MEM_5 = VDEF <.MEM_1(D)>
>> >>   _6 = __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
>> >>   # VUSE <.MEM_5>
>> >>   return _6;
>> >>
>> >> }
>> >>
>> >> While testing, I came across an issue with function f() defined
>> >> intail-padding1.C:
>> >> struct X
>> >> {
>> >>   ~X() {}
>> >>   int n;
>> >>   char d;
>> >> };
>> >>
>> >> X f()
>> >> {
>> >>   X nrvo;
>> >>   __builtin_memset (&nrvo, 0, sizeof(X));
>> >>   return nrvo;
>> >> }
>> >>
>> >> input to the pass:
>> >> X f() ()
>> >> {
>> >>   <bb 2> [0.0%]:
>> >>   # .MEM_3 = VDEF <.MEM_1(D)>
>> >>   __builtin_memset (nrvo_2(D), 0, 8);
>> >>   # VUSE <.MEM_3>
>> >>   return nrvo_2(D);
>> >>
>> >> }
>> >>
>> >> verify_gimple_return failed with:
>> >> tail-padding1.C:13:1: error: invalid conversion in return statement
>> >>  }
>> >>  ^
>> >> struct X
>> >>
>> >> struct X &
>> >>
>> >> # VUSE <.MEM_3>
>> >> return _4;
>> >>
>> >> It seems the return type of function (struct X) differs with the type
>> >> of return value (struct X&).
>> >> Not sure how this is possible ?
>> >
>> > You need to honor DECL_BY_REFERENCE of DECL_RESULT.
>> Thanks! Gating on !DECL_BY_REFERENCE (DECL_RESULT (cfun->decl))
>> resolved the error.
>> Does the attached version look OK ?
>
> +                         ass_var = make_ssa_name (TREE_TYPE (arg));
>
> can you try
>
>       ass_var = copy_ssa_name (arg);
>
> instead?  That way the underlying decl should make sure the
> DECL_BY_REFERENCE check in the IL verification works.
Done in the attached version and verified tail-padding1.C passes with
the change.
Does it look OK ?
Bootstrap+test in progress on x86_64-unknown-linux-gnu.

Thanks,
Prathamesh
>
> Thanks,
> Richard.
>
>
>> Validation in progress.
>>
>> Thanks,
>> Prathamesh
>> >
>> >> To work around that, I guarded the transform on:
>> >> useless_type_conversion_p (TREE_TYPE (TREE_TYPE (cfun->decl)),
>> >>                                              TREE_TYPE (retval)))
>> >>
>> >> in the patch. Does that look OK ?
>> >>
>> >> Bootstrap+tested on x86_64-unknown-linux-gnu with --enable-languages=all,ada.
>> >> Cross-tested on arm*-*-*, aarch64*-*-*.
>> >>
>> >> Thanks,
>> >> Prathamesh
>> >> >
>> >> >
>> >> > Jeff
>> >>
>> >
>> > --
>> > Richard Biener <rguenther@suse.de>
>> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
>>
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

Comments

Richard Biener Dec. 1, 2016, 1:08 p.m. UTC | #1
On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:

> On 1 December 2016 at 18:26, Richard Biener <rguenther@suse.de> wrote:
> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:
> >
> >> On 1 December 2016 at 17:40, Richard Biener <rguenther@suse.de> wrote:
> >> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:
> >> >
> >> >> On 25 November 2016 at 21:17, Jeff Law <law@redhat.com> wrote:
> >> >> > On 11/25/2016 01:07 AM, Richard Biener wrote:
> >> >> >
> >> >> >>> For the tail-call, issue should we artificially create a lhs and use that
> >> >> >>> as return value (perhaps by a separate pass before tailcall) ?
> >> >> >>>
> >> >> >>> __builtin_memcpy (a1, a2, a3);
> >> >> >>> return a1;
> >> >> >>>
> >> >> >>> gets transformed to:
> >> >> >>> _1 = __builtin_memcpy (a1, a2, a3)
> >> >> >>> return _1;
> >> >> >>>
> >> >> >>> So tail-call optimization pass would see the IL in it's expected form.
> >> >> >>
> >> >> >>
> >> >> >> As said, a RTL expert needs to chime in here.  Iff then tail-call
> >> >> >> itself should do this rewrite.  But if this form is required to make
> >> >> >> things work (I suppose you checked it _does_ actually work?) then
> >> >> >> we'd need to make sure later passes do not undo it.  So it looks
> >> >> >> fragile to me.  OTOH I seem to remember that the flags we set on
> >> >> >> GIMPLE are merely a hint to RTL expansion and the tailcalling is
> >> >> >> verified again there?
> >> >> >
> >> >> > So tail calling actually sits on the border between trees and RTL.
> >> >> > Essentially it's an expand-time decision as we use information from trees as
> >> >> > well as low level target information.
> >> >> >
> >> >> > I would not expect the former sequence to tail call.  The tail calling code
> >> >> > does not know that the return value from memcpy will be a1.  Thus the tail
> >> >> > calling code has to assume that it'll have to copy a1 into the return
> >> >> > register after returning from memcpy, which obviously can't be done if we
> >> >> > tail called memcpy.
> >> >> >
> >> >> > The second form is much more likely to turn into a tail call sequence
> >> >> > because the return value from memcpy will be sitting in the proper register.
> >> >> > This form out to work for most calling conventions that allow tail calls.
> >> >> >
> >> >> > We could (in theory) try and exploit the fact that memcpy returns its first
> >> >> > argument as a return value, but that would only be helpful on a target where
> >> >> > the first argument and return value use the same register. So I'd have a
> >> >> > slight preference to rewriting per Prathamesh's suggestion above since it's
> >> >> > more general.
> >> >> Thanks for the suggestion. The attached patch creates artificial lhs,
> >> >> and returns it if the function returns it's argument and that argument
> >> >> is used as return-value.
> >> >>
> >> >> eg:
> >> >> f (void * a1, void * a2, long unsigned int a3)
> >> >> {
> >> >>   <bb 2> [0.0%]:
> >> >>   # .MEM_5 = VDEF <.MEM_1(D)>
> >> >>   __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
> >> >>   # VUSE <.MEM_5>
> >> >>   return a1_2(D);
> >> >>
> >> >> }
> >> >>
> >> >> is transformed to:
> >> >> f (void * a1, void * a2, long unsigned int a3)
> >> >> {
> >> >>   void * _6;
> >> >>
> >> >>   <bb 2> [0.0%]:
> >> >>   # .MEM_5 = VDEF <.MEM_1(D)>
> >> >>   _6 = __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
> >> >>   # VUSE <.MEM_5>
> >> >>   return _6;
> >> >>
> >> >> }
> >> >>
> >> >> While testing, I came across an issue with function f() defined
> >> >> intail-padding1.C:
> >> >> struct X
> >> >> {
> >> >>   ~X() {}
> >> >>   int n;
> >> >>   char d;
> >> >> };
> >> >>
> >> >> X f()
> >> >> {
> >> >>   X nrvo;
> >> >>   __builtin_memset (&nrvo, 0, sizeof(X));
> >> >>   return nrvo;
> >> >> }
> >> >>
> >> >> input to the pass:
> >> >> X f() ()
> >> >> {
> >> >>   <bb 2> [0.0%]:
> >> >>   # .MEM_3 = VDEF <.MEM_1(D)>
> >> >>   __builtin_memset (nrvo_2(D), 0, 8);
> >> >>   # VUSE <.MEM_3>
> >> >>   return nrvo_2(D);
> >> >>
> >> >> }
> >> >>
> >> >> verify_gimple_return failed with:
> >> >> tail-padding1.C:13:1: error: invalid conversion in return statement
> >> >>  }
> >> >>  ^
> >> >> struct X
> >> >>
> >> >> struct X &
> >> >>
> >> >> # VUSE <.MEM_3>
> >> >> return _4;
> >> >>
> >> >> It seems the return type of function (struct X) differs with the type
> >> >> of return value (struct X&).
> >> >> Not sure how this is possible ?
> >> >
> >> > You need to honor DECL_BY_REFERENCE of DECL_RESULT.
> >> Thanks! Gating on !DECL_BY_REFERENCE (DECL_RESULT (cfun->decl))
> >> resolved the error.
> >> Does the attached version look OK ?
> >
> > +                         ass_var = make_ssa_name (TREE_TYPE (arg));
> >
> > can you try
> >
> >       ass_var = copy_ssa_name (arg);
> >
> > instead?  That way the underlying decl should make sure the
> > DECL_BY_REFERENCE check in the IL verification works.
> Done in the attached version and verified tail-padding1.C passes with
> the change.
> Does it look OK ?

+         if (!ass_var)
+           {
+             /* Check if function returns one if it's arguments
+                and that argument is used as return value.
+                In that case create an artificial lhs to call_stmt,
+                and set it as the return value.  */
+
+             unsigned rf = gimple_call_return_flags (call);
+             if (rf & ERF_RETURNS_ARG)
+               {
+                 unsigned argnum = rf & ERF_RETURN_ARG_MASK;
+                 if (argnum < gimple_call_num_args (call)
+                     && ret_stmt)

check && ret_stmt above together with !ass_var.

+                   {
+                     tree arg = gimple_call_arg (call, argnum);
+                     tree retval = gimple_return_retval (ret_stmt);
+                     if (retval
+                         && TREE_CODE (retval) == SSA_NAME
+                         && operand_equal_p (retval, arg, 0)
+                         && !DECL_BY_REFERENCE (DECL_RESULT 
(cfun->decl)))

the DECL_BY_REFERENCE check can be removed now (I think).

Richard.

> Bootstrap+test in progress on x86_64-unknown-linux-gnu.
> 
> Thanks,
> Prathamesh
> >
> > Thanks,
> > Richard.
> >
> >
> >> Validation in progress.
> >>
> >> Thanks,
> >> Prathamesh
> >> >
> >> >> To work around that, I guarded the transform on:
> >> >> useless_type_conversion_p (TREE_TYPE (TREE_TYPE (cfun->decl)),
> >> >>                                              TREE_TYPE (retval)))
> >> >>
> >> >> in the patch. Does that look OK ?
> >> >>
> >> >> Bootstrap+tested on x86_64-unknown-linux-gnu with --enable-languages=all,ada.
> >> >> Cross-tested on arm*-*-*, aarch64*-*-*.
> >> >>
> >> >> Thanks,
> >> >> Prathamesh
> >> >> >
> >> >> >
> >> >> > Jeff
> >> >>
> >> >
> >> > --
> >> > Richard Biener <rguenther@suse.de>
> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
> >>
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
>
Prathamesh Kulkarni Dec. 1, 2016, 1:18 p.m. UTC | #2
On 1 December 2016 at 18:38, Richard Biener <rguenther@suse.de> wrote:
> On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:
>
>> On 1 December 2016 at 18:26, Richard Biener <rguenther@suse.de> wrote:
>> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:
>> >
>> >> On 1 December 2016 at 17:40, Richard Biener <rguenther@suse.de> wrote:
>> >> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:
>> >> >
>> >> >> On 25 November 2016 at 21:17, Jeff Law <law@redhat.com> wrote:
>> >> >> > On 11/25/2016 01:07 AM, Richard Biener wrote:
>> >> >> >
>> >> >> >>> For the tail-call, issue should we artificially create a lhs and use that
>> >> >> >>> as return value (perhaps by a separate pass before tailcall) ?
>> >> >> >>>
>> >> >> >>> __builtin_memcpy (a1, a2, a3);
>> >> >> >>> return a1;
>> >> >> >>>
>> >> >> >>> gets transformed to:
>> >> >> >>> _1 = __builtin_memcpy (a1, a2, a3)
>> >> >> >>> return _1;
>> >> >> >>>
>> >> >> >>> So tail-call optimization pass would see the IL in it's expected form.
>> >> >> >>
>> >> >> >>
>> >> >> >> As said, a RTL expert needs to chime in here.  Iff then tail-call
>> >> >> >> itself should do this rewrite.  But if this form is required to make
>> >> >> >> things work (I suppose you checked it _does_ actually work?) then
>> >> >> >> we'd need to make sure later passes do not undo it.  So it looks
>> >> >> >> fragile to me.  OTOH I seem to remember that the flags we set on
>> >> >> >> GIMPLE are merely a hint to RTL expansion and the tailcalling is
>> >> >> >> verified again there?
>> >> >> >
>> >> >> > So tail calling actually sits on the border between trees and RTL.
>> >> >> > Essentially it's an expand-time decision as we use information from trees as
>> >> >> > well as low level target information.
>> >> >> >
>> >> >> > I would not expect the former sequence to tail call.  The tail calling code
>> >> >> > does not know that the return value from memcpy will be a1.  Thus the tail
>> >> >> > calling code has to assume that it'll have to copy a1 into the return
>> >> >> > register after returning from memcpy, which obviously can't be done if we
>> >> >> > tail called memcpy.
>> >> >> >
>> >> >> > The second form is much more likely to turn into a tail call sequence
>> >> >> > because the return value from memcpy will be sitting in the proper register.
>> >> >> > This form out to work for most calling conventions that allow tail calls.
>> >> >> >
>> >> >> > We could (in theory) try and exploit the fact that memcpy returns its first
>> >> >> > argument as a return value, but that would only be helpful on a target where
>> >> >> > the first argument and return value use the same register. So I'd have a
>> >> >> > slight preference to rewriting per Prathamesh's suggestion above since it's
>> >> >> > more general.
>> >> >> Thanks for the suggestion. The attached patch creates artificial lhs,
>> >> >> and returns it if the function returns it's argument and that argument
>> >> >> is used as return-value.
>> >> >>
>> >> >> eg:
>> >> >> f (void * a1, void * a2, long unsigned int a3)
>> >> >> {
>> >> >>   <bb 2> [0.0%]:
>> >> >>   # .MEM_5 = VDEF <.MEM_1(D)>
>> >> >>   __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
>> >> >>   # VUSE <.MEM_5>
>> >> >>   return a1_2(D);
>> >> >>
>> >> >> }
>> >> >>
>> >> >> is transformed to:
>> >> >> f (void * a1, void * a2, long unsigned int a3)
>> >> >> {
>> >> >>   void * _6;
>> >> >>
>> >> >>   <bb 2> [0.0%]:
>> >> >>   # .MEM_5 = VDEF <.MEM_1(D)>
>> >> >>   _6 = __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
>> >> >>   # VUSE <.MEM_5>
>> >> >>   return _6;
>> >> >>
>> >> >> }
>> >> >>
>> >> >> While testing, I came across an issue with function f() defined
>> >> >> intail-padding1.C:
>> >> >> struct X
>> >> >> {
>> >> >>   ~X() {}
>> >> >>   int n;
>> >> >>   char d;
>> >> >> };
>> >> >>
>> >> >> X f()
>> >> >> {
>> >> >>   X nrvo;
>> >> >>   __builtin_memset (&nrvo, 0, sizeof(X));
>> >> >>   return nrvo;
>> >> >> }
>> >> >>
>> >> >> input to the pass:
>> >> >> X f() ()
>> >> >> {
>> >> >>   <bb 2> [0.0%]:
>> >> >>   # .MEM_3 = VDEF <.MEM_1(D)>
>> >> >>   __builtin_memset (nrvo_2(D), 0, 8);
>> >> >>   # VUSE <.MEM_3>
>> >> >>   return nrvo_2(D);
>> >> >>
>> >> >> }
>> >> >>
>> >> >> verify_gimple_return failed with:
>> >> >> tail-padding1.C:13:1: error: invalid conversion in return statement
>> >> >>  }
>> >> >>  ^
>> >> >> struct X
>> >> >>
>> >> >> struct X &
>> >> >>
>> >> >> # VUSE <.MEM_3>
>> >> >> return _4;
>> >> >>
>> >> >> It seems the return type of function (struct X) differs with the type
>> >> >> of return value (struct X&).
>> >> >> Not sure how this is possible ?
>> >> >
>> >> > You need to honor DECL_BY_REFERENCE of DECL_RESULT.
>> >> Thanks! Gating on !DECL_BY_REFERENCE (DECL_RESULT (cfun->decl))
>> >> resolved the error.
>> >> Does the attached version look OK ?
>> >
>> > +                         ass_var = make_ssa_name (TREE_TYPE (arg));
>> >
>> > can you try
>> >
>> >       ass_var = copy_ssa_name (arg);
>> >
>> > instead?  That way the underlying decl should make sure the
>> > DECL_BY_REFERENCE check in the IL verification works.
>> Done in the attached version and verified tail-padding1.C passes with
>> the change.
>> Does it look OK ?
>
> +         if (!ass_var)
> +           {
> +             /* Check if function returns one if it's arguments
> +                and that argument is used as return value.
> +                In that case create an artificial lhs to call_stmt,
> +                and set it as the return value.  */
> +
> +             unsigned rf = gimple_call_return_flags (call);
> +             if (rf & ERF_RETURNS_ARG)
> +               {
> +                 unsigned argnum = rf & ERF_RETURN_ARG_MASK;
> +                 if (argnum < gimple_call_num_args (call)
> +                     && ret_stmt)
>
> check && ret_stmt above together with !ass_var.
Oops, sorry about that.
>
> +                   {
> +                     tree arg = gimple_call_arg (call, argnum);
> +                     tree retval = gimple_return_retval (ret_stmt);
> +                     if (retval
> +                         && TREE_CODE (retval) == SSA_NAME
> +                         && operand_equal_p (retval, arg, 0)
> +                         && !DECL_BY_REFERENCE (DECL_RESULT
> (cfun->decl)))
>
> the DECL_BY_REFERENCE check can be removed now (I think).
Well after removing DECL_BY_REFERENCE, verify_gimple still fails but
differently:

tail-padding1.C:13:1: error: RESULT_DECL should be read only when
DECL_BY_REFERENCE is set
 }
 ^
while verifying SSA_NAME nrvo_4 in statement
# .MEM_3 = VDEF <.MEM_1(D)>
    nrvo_4 = __builtin_memset (nrvo_2(D), 0, 8);
tail-padding1.C:13:1: internal compiler error: verify_ssa failed

Thanks,
Prathamesh
>
> Richard.
>
>> Bootstrap+test in progress on x86_64-unknown-linux-gnu.
>>
>> Thanks,
>> Prathamesh
>> >
>> > Thanks,
>> > Richard.
>> >
>> >
>> >> Validation in progress.
>> >>
>> >> Thanks,
>> >> Prathamesh
>> >> >
>> >> >> To work around that, I guarded the transform on:
>> >> >> useless_type_conversion_p (TREE_TYPE (TREE_TYPE (cfun->decl)),
>> >> >>                                              TREE_TYPE (retval)))
>> >> >>
>> >> >> in the patch. Does that look OK ?
>> >> >>
>> >> >> Bootstrap+tested on x86_64-unknown-linux-gnu with --enable-languages=all,ada.
>> >> >> Cross-tested on arm*-*-*, aarch64*-*-*.
>> >> >>
>> >> >> Thanks,
>> >> >> Prathamesh
>> >> >> >
>> >> >> >
>> >> >> > Jeff
>> >> >>
>> >> >
>> >> > --
>> >> > Richard Biener <rguenther@suse.de>
>> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
>> >>
>> >
>> > --
>> > Richard Biener <rguenther@suse.de>
>> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
>>
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Richard Biener Dec. 1, 2016, 1:22 p.m. UTC | #3
On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:

> On 1 December 2016 at 18:38, Richard Biener <rguenther@suse.de> wrote:
> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:
> >
> >> On 1 December 2016 at 18:26, Richard Biener <rguenther@suse.de> wrote:
> >> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:
> >> >
> >> >> On 1 December 2016 at 17:40, Richard Biener <rguenther@suse.de> wrote:
> >> >> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:
> >> >> >
> >> >> >> On 25 November 2016 at 21:17, Jeff Law <law@redhat.com> wrote:
> >> >> >> > On 11/25/2016 01:07 AM, Richard Biener wrote:
> >> >> >> >
> >> >> >> >>> For the tail-call, issue should we artificially create a lhs and use that
> >> >> >> >>> as return value (perhaps by a separate pass before tailcall) ?
> >> >> >> >>>
> >> >> >> >>> __builtin_memcpy (a1, a2, a3);
> >> >> >> >>> return a1;
> >> >> >> >>>
> >> >> >> >>> gets transformed to:
> >> >> >> >>> _1 = __builtin_memcpy (a1, a2, a3)
> >> >> >> >>> return _1;
> >> >> >> >>>
> >> >> >> >>> So tail-call optimization pass would see the IL in it's expected form.
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> As said, a RTL expert needs to chime in here.  Iff then tail-call
> >> >> >> >> itself should do this rewrite.  But if this form is required to make
> >> >> >> >> things work (I suppose you checked it _does_ actually work?) then
> >> >> >> >> we'd need to make sure later passes do not undo it.  So it looks
> >> >> >> >> fragile to me.  OTOH I seem to remember that the flags we set on
> >> >> >> >> GIMPLE are merely a hint to RTL expansion and the tailcalling is
> >> >> >> >> verified again there?
> >> >> >> >
> >> >> >> > So tail calling actually sits on the border between trees and RTL.
> >> >> >> > Essentially it's an expand-time decision as we use information from trees as
> >> >> >> > well as low level target information.
> >> >> >> >
> >> >> >> > I would not expect the former sequence to tail call.  The tail calling code
> >> >> >> > does not know that the return value from memcpy will be a1.  Thus the tail
> >> >> >> > calling code has to assume that it'll have to copy a1 into the return
> >> >> >> > register after returning from memcpy, which obviously can't be done if we
> >> >> >> > tail called memcpy.
> >> >> >> >
> >> >> >> > The second form is much more likely to turn into a tail call sequence
> >> >> >> > because the return value from memcpy will be sitting in the proper register.
> >> >> >> > This form out to work for most calling conventions that allow tail calls.
> >> >> >> >
> >> >> >> > We could (in theory) try and exploit the fact that memcpy returns its first
> >> >> >> > argument as a return value, but that would only be helpful on a target where
> >> >> >> > the first argument and return value use the same register. So I'd have a
> >> >> >> > slight preference to rewriting per Prathamesh's suggestion above since it's
> >> >> >> > more general.
> >> >> >> Thanks for the suggestion. The attached patch creates artificial lhs,
> >> >> >> and returns it if the function returns it's argument and that argument
> >> >> >> is used as return-value.
> >> >> >>
> >> >> >> eg:
> >> >> >> f (void * a1, void * a2, long unsigned int a3)
> >> >> >> {
> >> >> >>   <bb 2> [0.0%]:
> >> >> >>   # .MEM_5 = VDEF <.MEM_1(D)>
> >> >> >>   __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
> >> >> >>   # VUSE <.MEM_5>
> >> >> >>   return a1_2(D);
> >> >> >>
> >> >> >> }
> >> >> >>
> >> >> >> is transformed to:
> >> >> >> f (void * a1, void * a2, long unsigned int a3)
> >> >> >> {
> >> >> >>   void * _6;
> >> >> >>
> >> >> >>   <bb 2> [0.0%]:
> >> >> >>   # .MEM_5 = VDEF <.MEM_1(D)>
> >> >> >>   _6 = __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
> >> >> >>   # VUSE <.MEM_5>
> >> >> >>   return _6;
> >> >> >>
> >> >> >> }
> >> >> >>
> >> >> >> While testing, I came across an issue with function f() defined
> >> >> >> intail-padding1.C:
> >> >> >> struct X
> >> >> >> {
> >> >> >>   ~X() {}
> >> >> >>   int n;
> >> >> >>   char d;
> >> >> >> };
> >> >> >>
> >> >> >> X f()
> >> >> >> {
> >> >> >>   X nrvo;
> >> >> >>   __builtin_memset (&nrvo, 0, sizeof(X));
> >> >> >>   return nrvo;
> >> >> >> }
> >> >> >>
> >> >> >> input to the pass:
> >> >> >> X f() ()
> >> >> >> {
> >> >> >>   <bb 2> [0.0%]:
> >> >> >>   # .MEM_3 = VDEF <.MEM_1(D)>
> >> >> >>   __builtin_memset (nrvo_2(D), 0, 8);
> >> >> >>   # VUSE <.MEM_3>
> >> >> >>   return nrvo_2(D);
> >> >> >>
> >> >> >> }
> >> >> >>
> >> >> >> verify_gimple_return failed with:
> >> >> >> tail-padding1.C:13:1: error: invalid conversion in return statement
> >> >> >>  }
> >> >> >>  ^
> >> >> >> struct X
> >> >> >>
> >> >> >> struct X &
> >> >> >>
> >> >> >> # VUSE <.MEM_3>
> >> >> >> return _4;
> >> >> >>
> >> >> >> It seems the return type of function (struct X) differs with the type
> >> >> >> of return value (struct X&).
> >> >> >> Not sure how this is possible ?
> >> >> >
> >> >> > You need to honor DECL_BY_REFERENCE of DECL_RESULT.
> >> >> Thanks! Gating on !DECL_BY_REFERENCE (DECL_RESULT (cfun->decl))
> >> >> resolved the error.
> >> >> Does the attached version look OK ?
> >> >
> >> > +                         ass_var = make_ssa_name (TREE_TYPE (arg));
> >> >
> >> > can you try
> >> >
> >> >       ass_var = copy_ssa_name (arg);
> >> >
> >> > instead?  That way the underlying decl should make sure the
> >> > DECL_BY_REFERENCE check in the IL verification works.
> >> Done in the attached version and verified tail-padding1.C passes with
> >> the change.
> >> Does it look OK ?
> >
> > +         if (!ass_var)
> > +           {
> > +             /* Check if function returns one if it's arguments
> > +                and that argument is used as return value.
> > +                In that case create an artificial lhs to call_stmt,
> > +                and set it as the return value.  */
> > +
> > +             unsigned rf = gimple_call_return_flags (call);
> > +             if (rf & ERF_RETURNS_ARG)
> > +               {
> > +                 unsigned argnum = rf & ERF_RETURN_ARG_MASK;
> > +                 if (argnum < gimple_call_num_args (call)
> > +                     && ret_stmt)
> >
> > check && ret_stmt above together with !ass_var.
> Oops, sorry about that.
> >
> > +                   {
> > +                     tree arg = gimple_call_arg (call, argnum);
> > +                     tree retval = gimple_return_retval (ret_stmt);
> > +                     if (retval
> > +                         && TREE_CODE (retval) == SSA_NAME
> > +                         && operand_equal_p (retval, arg, 0)
> > +                         && !DECL_BY_REFERENCE (DECL_RESULT
> > (cfun->decl)))
> >
> > the DECL_BY_REFERENCE check can be removed now (I think).
> Well after removing DECL_BY_REFERENCE, verify_gimple still fails but
> differently:
> 
> tail-padding1.C:13:1: error: RESULT_DECL should be read only when
> DECL_BY_REFERENCE is set
>  }
>  ^
> while verifying SSA_NAME nrvo_4 in statement
> # .MEM_3 = VDEF <.MEM_1(D)>
>     nrvo_4 = __builtin_memset (nrvo_2(D), 0, 8);
> tail-padding1.C:13:1: internal compiler error: verify_ssa failed

Hmm, ok.  Not sure why we enforce this.

Note that in the end this patch looks fishy -- iff we really need
the LHS on the assignment for correctness if we have the tailcall
flag set then what guarantees that later passes do not remove it
again?  So anybody removing a LHS would need to unset the tailcall flag?

Saying again that I don't know enough about the RTL part of tailcall
expansion.

Richard.

> Thanks,
> Prathamesh
> >
> > Richard.
> >
> >> Bootstrap+test in progress on x86_64-unknown-linux-gnu.
> >>
> >> Thanks,
> >> Prathamesh
> >> >
> >> > Thanks,
> >> > Richard.
> >> >
> >> >
> >> >> Validation in progress.
> >> >>
> >> >> Thanks,
> >> >> Prathamesh
> >> >> >
> >> >> >> To work around that, I guarded the transform on:
> >> >> >> useless_type_conversion_p (TREE_TYPE (TREE_TYPE (cfun->decl)),
> >> >> >>                                              TREE_TYPE (retval)))
> >> >> >>
> >> >> >> in the patch. Does that look OK ?
> >> >> >>
> >> >> >> Bootstrap+tested on x86_64-unknown-linux-gnu with --enable-languages=all,ada.
> >> >> >> Cross-tested on arm*-*-*, aarch64*-*-*.
> >> >> >>
> >> >> >> Thanks,
> >> >> >> Prathamesh
> >> >> >> >
> >> >> >> >
> >> >> >> > Jeff
> >> >> >>
> >> >> >
> >> >> > --
> >> >> > Richard Biener <rguenther@suse.de>
> >> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
> >> >>
> >> >
> >> > --
> >> > Richard Biener <rguenther@suse.de>
> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
> >>
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
> 
>
Jeff Law Dec. 1, 2016, 10:27 p.m. UTC | #4
On 12/01/2016 06:22 AM, Richard Biener wrote:
>> Well after removing DECL_BY_REFERENCE, verify_gimple still fails but
>> differently:
>>
>> tail-padding1.C:13:1: error: RESULT_DECL should be read only when
>> DECL_BY_REFERENCE is set
>>  }
>>  ^
>> while verifying SSA_NAME nrvo_4 in statement
>> # .MEM_3 = VDEF <.MEM_1(D)>
>>     nrvo_4 = __builtin_memset (nrvo_2(D), 0, 8);
>> tail-padding1.C:13:1: internal compiler error: verify_ssa failed
>
> Hmm, ok.  Not sure why we enforce this.
I don't know either.  But I would start by looking at tree-nrv.c since 
it looks (based on the variable names) that the named-value-return 
optimization kicked in.

>
> Note that in the end this patch looks fishy -- iff we really need
> the LHS on the assignment for correctness if we have the tailcall
> flag set then what guarantees that later passes do not remove it
> again?  So anybody removing a LHS would need to unset the tailcall flag?
>
> Saying again that I don't know enough about the RTL part of tailcall
> expansion.
The LHS on the assignment makes it easier to identify when a tail call 
is possible.  It's not needed for correctness.  Not having the LHS on 
the assignment just means we won't get an optimized tail call.

Under what circumstances would the LHS possibly be removed?  We know the 
return statement references the LHS, so it's not going to be something 
that DCE will do.

jeff
Richard Biener Dec. 2, 2016, 8:33 a.m. UTC | #5
On Thu, 1 Dec 2016, Jeff Law wrote:

> On 12/01/2016 06:22 AM, Richard Biener wrote:
> > > Well after removing DECL_BY_REFERENCE, verify_gimple still fails but
> > > differently:
> > > 
> > > tail-padding1.C:13:1: error: RESULT_DECL should be read only when
> > > DECL_BY_REFERENCE is set
> > >  }
> > >  ^
> > > while verifying SSA_NAME nrvo_4 in statement
> > > # .MEM_3 = VDEF <.MEM_1(D)>
> > >     nrvo_4 = __builtin_memset (nrvo_2(D), 0, 8);
> > > tail-padding1.C:13:1: internal compiler error: verify_ssa failed
> > 
> > Hmm, ok.  Not sure why we enforce this.
> I don't know either.  But I would start by looking at tree-nrv.c since it
> looks (based on the variable names) that the named-value-return optimization
> kicked in.
> 
> > 
> > Note that in the end this patch looks fishy -- iff we really need
> > the LHS on the assignment for correctness if we have the tailcall
> > flag set then what guarantees that later passes do not remove it
> > again?  So anybody removing a LHS would need to unset the tailcall flag?
> > 
> > Saying again that I don't know enough about the RTL part of tailcall
> > expansion.
> The LHS on the assignment makes it easier to identify when a tail call is
> possible.  It's not needed for correctness.  Not having the LHS on the
> assignment just means we won't get an optimized tail call.
> 
> Under what circumstances would the LHS possibly be removed?  We know the
> return statement references the LHS, so it's not going to be something that
> DCE will do.

Well, I thought Prathamesh added the optimization to copy-propagate
the lhs from the returned argument.  So we'd have both transforms here.

Of course as always the user could have written the code in this way.

If the LHS is not required for correctness then I don't think we need
to put it there - Pratamesh verified the call is tail-called already
if marked by the tailcall pass, even if the LHS is not present.

Richard.
Jeff Law Dec. 5, 2016, 10:53 p.m. UTC | #6
On 12/02/2016 01:33 AM, Richard Biener wrote:
>> The LHS on the assignment makes it easier to identify when a tail call is
>> possible.  It's not needed for correctness.  Not having the LHS on the
>> assignment just means we won't get an optimized tail call.
>>
>> Under what circumstances would the LHS possibly be removed?  We know the
>> return statement references the LHS, so it's not going to be something that
>> DCE will do.
>
> Well, I thought Prathamesh added the optimization to copy-propagate
> the lhs from the returned argument.  So we'd have both transforms here.
That seems like a mistake -- the fact that we can copy propagate the LHS 
from the returned argument is interesting, but in practice I've found it 
to not be useful to do so.

The problem is it makes the value look live across a the call and we're 
then dependent upon the register allocator to know the trick about the 
returned argument value and apply it consistently -- which it does not 
last I checked.

I think we're better off leaving the call in the form of LHS = call () 
if the return value is used.  That's going to be more palatable to tail 
calling.

>
> Of course as always the user could have written the code in this way.
>
> If the LHS is not required for correctness then I don't think we need
> to put it there - Pratamesh verified the call is tail-called already
> if marked by the tailcall pass, even if the LHS is not present.
But if the function returns the value from the tail call, then going 
through an LHS is the right thing to do.  Using the magic "argX will be 
the return value" seems clever, but actually hurts in practice.
jeff
Richard Biener Dec. 6, 2016, 8:25 a.m. UTC | #7
On Mon, 5 Dec 2016, Jeff Law wrote:

> On 12/02/2016 01:33 AM, Richard Biener wrote:
> > > The LHS on the assignment makes it easier to identify when a tail call is
> > > possible.  It's not needed for correctness.  Not having the LHS on the
> > > assignment just means we won't get an optimized tail call.
> > > 
> > > Under what circumstances would the LHS possibly be removed?  We know the
> > > return statement references the LHS, so it's not going to be something
> > > that
> > > DCE will do.
> > 
> > Well, I thought Prathamesh added the optimization to copy-propagate
> > the lhs from the returned argument.  So we'd have both transforms here.
> That seems like a mistake -- the fact that we can copy propagate the LHS from
> the returned argument is interesting, but in practice I've found it to not be
> useful to do so.
> 
> The problem is it makes the value look live across a the call and we're then
> dependent upon the register allocator to know the trick about the returned
> argument value and apply it consistently -- which it does not last I checked.
> 
> I think we're better off leaving the call in the form of LHS = call () if the
> return value is used.  That's going to be more palatable to tail calling.

Yes, that's something I also raised earlier in the thread.  Note that
any kind of value-numbering probably wants to know the equivalence
for simplifications but in the end wants to disable propagating the
copy (in fact it should propagate the return value from the point of
the call).  I suppose I know how to implement that in FRE/PRE given it has
separate value-numbering and elimination phases.  Something for GCC 8.

> > Of course as always the user could have written the code in this way.
> > 
> > If the LHS is not required for correctness then I don't think we need
> > to put it there - Pratamesh verified the call is tail-called already
> > if marked by the tailcall pass, even if the LHS is not present.
> But if the function returns the value from the tail call, then going through
> an LHS is the right thing to do.  Using the magic "argX will be the return
> value" seems clever, but actually hurts in practice.

So we do want the reverse transform (for the case of returning by
reference that's going to be tricky if not impossible due to the
IL hygiene we enforce).

Richard.
Jeff Law Dec. 7, 2016, 4:54 p.m. UTC | #8
On 12/06/2016 01:25 AM, Richard Biener wrote:
>> But if the function returns the value from the tail call, then going through
>> an LHS is the right thing to do.  Using the magic "argX will be the return
>> value" seems clever, but actually hurts in practice.
>
> So we do want the reverse transform (for the case of returning by
> reference that's going to be tricky if not impossible due to the
> IL hygiene we enforce).
It might be useful, but I'd like to see instrumentation before doing any 
significant work on this problem.

jeff
diff mbox

Patch

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/tailcall-9.c b/gcc/testsuite/gcc.dg/tree-ssa/tailcall-9.c
new file mode 100644
index 0000000..b3fdc6c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/tailcall-9.c
@@ -0,0 +1,10 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-tailc-details" } */
+
+void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
+{
+  __builtin_memcpy (a1, a2, a3);
+  return a1;
+}
+
+/* { dg-final { scan-tree-dump-times "Found tail call" 1 "tailc" } } */ 
diff --git a/gcc/tree-tailcall.c b/gcc/tree-tailcall.c
index 66a0a4c..6135dc2 100644
--- a/gcc/tree-tailcall.c
+++ b/gcc/tree-tailcall.c
@@ -401,6 +401,7 @@  find_tail_calls (basic_block bb, struct tailcall **ret)
   basic_block abb;
   size_t idx;
   tree var;
+  greturn *ret_stmt = NULL;
 
   if (!single_succ_p (bb))
     return;
@@ -408,6 +409,8 @@  find_tail_calls (basic_block bb, struct tailcall **ret)
   for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev (&gsi))
     {
       stmt = gsi_stmt (gsi);
+      if (!ret_stmt)
+	ret_stmt = dyn_cast<greturn *> (stmt);
 
       /* Ignore labels, returns, nops, clobbers and debug stmts.  */
       if (gimple_code (stmt) == GIMPLE_LABEL
@@ -422,6 +425,36 @@  find_tail_calls (basic_block bb, struct tailcall **ret)
 	{
 	  call = as_a <gcall *> (stmt);
 	  ass_var = gimple_call_lhs (call);
+	  if (!ass_var)
+	    {
+	      /* Check if function returns one if it's arguments
+		 and that argument is used as return value.
+		 In that case create an artificial lhs to call_stmt,
+		 and set it as the return value.  */
+
+	      unsigned rf = gimple_call_return_flags (call);
+	      if (rf & ERF_RETURNS_ARG)
+		{
+		  unsigned argnum = rf & ERF_RETURN_ARG_MASK;
+		  if (argnum < gimple_call_num_args (call)
+		      && ret_stmt)
+		    {
+		      tree arg = gimple_call_arg (call, argnum);
+		      tree retval = gimple_return_retval (ret_stmt);
+		      if (retval
+			  && TREE_CODE (retval) == SSA_NAME
+			  && operand_equal_p (retval, arg, 0)
+			  && !DECL_BY_REFERENCE (DECL_RESULT (cfun->decl)))
+			{
+			  ass_var = copy_ssa_name (arg);
+			  gimple_call_set_lhs (call, ass_var);
+			  update_stmt (call);
+			  gimple_return_set_retval (ret_stmt, ass_var);
+			  update_stmt (ret_stmt);
+			}
+		    }
+		}
+	    }
 	  break;
 	}