[tree-tailcall] Check if function returns it's argument

On 1 December 2016 at 18:26, Richard Biener <rguenther@suse.de> wrote:
> On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:
>
>> On 1 December 2016 at 17:40, Richard Biener <rguenther@suse.de> wrote:
>> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:
>> >
>> >> On 25 November 2016 at 21:17, Jeff Law <law@redhat.com> wrote:
>> >> > On 11/25/2016 01:07 AM, Richard Biener wrote:
>> >> >
>> >> >>> For the tail-call, issue should we artificially create a lhs and use that
>> >> >>> as return value (perhaps by a separate pass before tailcall) ?
>> >> >>>
>> >> >>> __builtin_memcpy (a1, a2, a3);
>> >> >>> return a1;
>> >> >>>
>> >> >>> gets transformed to:
>> >> >>> _1 = __builtin_memcpy (a1, a2, a3)
>> >> >>> return _1;
>> >> >>>
>> >> >>> So tail-call optimization pass would see the IL in it's expected form.
>> >> >>
>> >> >>
>> >> >> As said, a RTL expert needs to chime in here.  Iff then tail-call
>> >> >> itself should do this rewrite.  But if this form is required to make
>> >> >> things work (I suppose you checked it _does_ actually work?) then
>> >> >> we'd need to make sure later passes do not undo it.  So it looks
>> >> >> fragile to me.  OTOH I seem to remember that the flags we set on
>> >> >> GIMPLE are merely a hint to RTL expansion and the tailcalling is
>> >> >> verified again there?
>> >> >
>> >> > So tail calling actually sits on the border between trees and RTL.
>> >> > Essentially it's an expand-time decision as we use information from trees as
>> >> > well as low level target information.
>> >> >
>> >> > I would not expect the former sequence to tail call.  The tail calling code
>> >> > does not know that the return value from memcpy will be a1.  Thus the tail
>> >> > calling code has to assume that it'll have to copy a1 into the return
>> >> > register after returning from memcpy, which obviously can't be done if we
>> >> > tail called memcpy.
>> >> >
>> >> > The second form is much more likely to turn into a tail call sequence
>> >> > because the return value from memcpy will be sitting in the proper register.
>> >> > This form out to work for most calling conventions that allow tail calls.
>> >> >
>> >> > We could (in theory) try and exploit the fact that memcpy returns its first
>> >> > argument as a return value, but that would only be helpful on a target where
>> >> > the first argument and return value use the same register. So I'd have a
>> >> > slight preference to rewriting per Prathamesh's suggestion above since it's
>> >> > more general.
>> >> Thanks for the suggestion. The attached patch creates artificial lhs,
>> >> and returns it if the function returns it's argument and that argument
>> >> is used as return-value.
>> >>
>> >> eg:
>> >> f (void * a1, void * a2, long unsigned int a3)
>> >> {
>> >>   <bb 2> [0.0%]:
>> >>   # .MEM_5 = VDEF <.MEM_1(D)>
>> >>   __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
>> >>   # VUSE <.MEM_5>
>> >>   return a1_2(D);
>> >>
>> >> }
>> >>
>> >> is transformed to:
>> >> f (void * a1, void * a2, long unsigned int a3)
>> >> {
>> >>   void * _6;
>> >>
>> >>   <bb 2> [0.0%]:
>> >>   # .MEM_5 = VDEF <.MEM_1(D)>
>> >>   _6 = __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));
>> >>   # VUSE <.MEM_5>
>> >>   return _6;
>> >>
>> >> }
>> >>
>> >> While testing, I came across an issue with function f() defined
>> >> intail-padding1.C:
>> >> struct X
>> >> {
>> >>   ~X() {}
>> >>   int n;
>> >>   char d;
>> >> };
>> >>
>> >> X f()
>> >> {
>> >>   X nrvo;
>> >>   __builtin_memset (&nrvo, 0, sizeof(X));
>> >>   return nrvo;
>> >> }
>> >>
>> >> input to the pass:
>> >> X f() ()
>> >> {
>> >>   <bb 2> [0.0%]:
>> >>   # .MEM_3 = VDEF <.MEM_1(D)>
>> >>   __builtin_memset (nrvo_2(D), 0, 8);
>> >>   # VUSE <.MEM_3>
>> >>   return nrvo_2(D);
>> >>
>> >> }
>> >>
>> >> verify_gimple_return failed with:
>> >> tail-padding1.C:13:1: error: invalid conversion in return statement
>> >>  }
>> >>  ^
>> >> struct X
>> >>
>> >> struct X &
>> >>
>> >> # VUSE <.MEM_3>
>> >> return _4;
>> >>
>> >> It seems the return type of function (struct X) differs with the type
>> >> of return value (struct X&).
>> >> Not sure how this is possible ?
>> >
>> > You need to honor DECL_BY_REFERENCE of DECL_RESULT.
>> Thanks! Gating on !DECL_BY_REFERENCE (DECL_RESULT (cfun->decl))
>> resolved the error.
>> Does the attached version look OK ?
>
> +                         ass_var = make_ssa_name (TREE_TYPE (arg));
>
> can you try
>
>       ass_var = copy_ssa_name (arg);
>
> instead?  That way the underlying decl should make sure the
> DECL_BY_REFERENCE check in the IL verification works.
Done in the attached version and verified tail-padding1.C passes with
the change.
Does it look OK ?
Bootstrap+test in progress on x86_64-unknown-linux-gnu.

Thanks,
Prathamesh
>
> Thanks,
> Richard.
>
>
>> Validation in progress.
>>
>> Thanks,
>> Prathamesh
>> >
>> >> To work around that, I guarded the transform on:
>> >> useless_type_conversion_p (TREE_TYPE (TREE_TYPE (cfun->decl)),
>> >>                                              TREE_TYPE (retval)))
>> >>
>> >> in the patch. Does that look OK ?
>> >>
>> >> Bootstrap+tested on x86_64-unknown-linux-gnu with --enable-languages=all,ada.
>> >> Cross-tested on arm*-*-*, aarch64*-*-*.
>> >>
>> >> Thanks,
>> >> Prathamesh
>> >> >
>> >> >
>> >> > Jeff
>> >>
>> >
>> > --
>> > Richard Biener <rguenther@suse.de>
>> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
>>
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

[tree-tailcall] Check if function returns it's argument

Commit Message

Comments

Patch