diff mbox

[RFC,X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

Message ID CAAs8Hmxg2a77HdJuedSkO8EkfROvSx_V2VOcZtWUAEiXLC3G6g@mail.gmail.com
State New
Headers show

Commit Message

Sriraman Tallam May 29, 2015, 9:37 p.m. UTC
On Fri, May 29, 2015 at 12:35 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>       * config/i386/i386.c (avoid_plt_to_call): New function.
>>       (ix86_output_call_insn): Generate indirect call for functions
>>       marked with "noplt" attribute.
>>       (attribute_spec ix86_attribute_): Define new attribute "noplt".
>>       * doc/extend.texi: Document new attribute "noplt".
>>       * gcc.target/i386/noplt-1.c: New testcase.
>>       * gcc.target/i386/noplt-2.c: New testcase.
>>
>> Index: config/i386/i386.c
>> ===================================================================
>> --- config/i386/i386.c        (revision 223720)
>> +++ config/i386/i386.c        (working copy)
>> @@ -25599,6 +25599,24 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call
>>    return call;
>>  }
>>
>> +/* Return true if the function being called was marked with attribute
>> +   "noplt".  If this function is defined, this should return false.  */
>> +static bool
>> +avoid_plt_to_call (rtx call_op)
>> +{
>> +  if (SYMBOL_REF_LOCAL_P (call_op))
>> +    return false;
>> +
>> +  tree symbol_decl = SYMBOL_REF_DECL (call_op);
>> +
>> +  if (symbol_decl != NULL_TREE
>> +      && TREE_CODE (symbol_decl) == FUNCTION_DECL
>> +      && lookup_attribute ("noplt", DECL_ATTRIBUTES (symbol_decl)))
>> +    return true;
>> +
>> +  return false;
>> +}
>
> OK, now we have __attribute__ (optimize("noplt")) which binds to the caller and makes
> all calls in the function to skip PLT and __attribute__ ("noplt") which binds to callee
> and makes all calls to function to not use PLT.
>
> That sort of makes sense to me, but why "noplt" attribute is not implemented at generic level
> just like -fplt? Is it only because every target supporting PLT would need update in its
> call expansion patterns?

Yes, that is what I had in mind.

>
> Also I think the PLT calls have EBX in call fusage wich is added by ix86_expand_call.
>   else
>     {
>       /* Static functions and indirect calls don't need the pic register.  */
>       if (flag_pic
>           && (!TARGET_64BIT
>               || (ix86_cmodel == CM_LARGE_PIC
>                   && DEFAULT_ABI != MS_ABI))
>           && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF
>           && ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0)))
>         {
>           use_reg (&use, gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM));
>           if (ix86_use_pseudo_pic_reg ())
>             emit_move_insn (gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM),
>                             pic_offset_table_rtx);
>         }
>
> I think you want to take that away from FUSAGE there just like we do for local calls
> (and in fact the code should already check flag_pic && flag_plt I suppose.

Done that now and patch attached.

Thanks
Sri

>
> Honza
* config/i386/i386.c (avoid_plt_to_call): New function.
	(ix86_expand_call): Dont use the PIC register when external function
	calls are not made via PLT.
	(ix86_output_call_insn): Generate indirect call for functions
	marked with "noplt" attribute.
	(attribute_spec ix86_attribute_): Define new attribute "noplt".
	* doc/extend.texi: Document new attribute "noplt".
	* gcc.target/i386/noplt-1.c: New testcase.
	* gcc.target/i386/noplt-2.c: New testcase.

Comments

Sriraman Tallam May 29, 2015, 11:36 p.m. UTC | #1
On Fri, May 29, 2015 at 3:24 PM, Ramana Radhakrishnan
<ramana.gcc@googlemail.com> wrote:
>
>
> On Friday, 29 May 2015, Sriraman Tallam <tmsriram@google.com> wrote:
>>
>> On Fri, May 29, 2015 at 12:35 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> >>       * config/i386/i386.c (avoid_plt_to_call): New function.
>> >>       (ix86_output_call_insn): Generate indirect call for functions
>> >>       marked with "noplt" attribute.
>> >>       (attribute_spec ix86_attribute_): Define new attribute "noplt".
>> >>       * doc/extend.texi: Document new attribute "noplt".
>> >>       * gcc.target/i386/noplt-1.c: New testcase.
>> >>       * gcc.target/i386/noplt-2.c: New testcase.
>> >>
>> >> Index: config/i386/i386.c
>> >> ===================================================================
>> >> --- config/i386/i386.c        (revision 223720)
>> >> +++ config/i386/i386.c        (working copy)
>> >> @@ -25599,6 +25599,24 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx
>> >> call
>> >>    return call;
>> >>  }
>> >>
>> >> +/* Return true if the function being called was marked with attribute
>> >> +   "noplt".  If this function is defined, this should return false.
>> >> */
>> >> +static bool
>> >> +avoid_plt_to_call (rtx call_op)
>> >> +{
>> >> +  if (SYMBOL_REF_LOCAL_P (call_op))
>> >> +    return false;
>> >> +
>> >> +  tree symbol_decl = SYMBOL_REF_DECL (call_op);
>> >> +
>> >> +  if (symbol_decl != NULL_TREE
>> >> +      && TREE_CODE (symbol_decl) == FUNCTION_DECL
>> >> +      && lookup_attribute ("noplt", DECL_ATTRIBUTES (symbol_decl)))
>> >> +    return true;
>> >> +
>> >> +  return false;
>> >> +}
>> >
>> > OK, now we have __attribute__ (optimize("noplt")) which binds to the
>> > caller and makes
>> > all calls in the function to skip PLT and __attribute__ ("noplt") which
>> > binds to callee
>> > and makes all calls to function to not use PLT.
>> >
>> > That sort of makes sense to me, but why "noplt" attribute is not
>> > implemented at generic level
>> > just like -fplt? Is it only because every target supporting PLT would
>> > need update in its
>> > call expansion patterns?
>>
>> Yes, that is what I had in mind.
>>
>
>
> Why isn't it just an indirect call in the cases that would require a GOT
> slot and a direct call otherwise ? I'm trying to work out what's so
> different on each target that mandates this to be in the target backend.
> Also it would be better to push the tests into gcc.dg if you can and check
> for the absence of a relocation so that folks at least see these as being
> UNSUPPORTED on their target.

I am not familiar with PLT calls for other targets.  I can move the
tests to gcc.dg but what relocation are you suggesting I check for?

Thanks
Sri


>
>
>
> Ramana
>>
>> >
>> > Also I think the PLT calls have EBX in call fusage wich is added by
>> > ix86_expand_call.
>> >   else
>> >     {
>> >       /* Static functions and indirect calls don't need the pic
>> > register.  */
>> >       if (flag_pic
>> >           && (!TARGET_64BIT
>> >               || (ix86_cmodel == CM_LARGE_PIC
>> >                   && DEFAULT_ABI != MS_ABI))
>> >           && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF
>> >           && ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0)))
>> >         {
>> >           use_reg (&use, gen_rtx_REG (Pmode,
>> > REAL_PIC_OFFSET_TABLE_REGNUM));
>> >           if (ix86_use_pseudo_pic_reg ())
>> >             emit_move_insn (gen_rtx_REG (Pmode,
>> > REAL_PIC_OFFSET_TABLE_REGNUM),
>> >                             pic_offset_table_rtx);
>> >         }
>> >
>> > I think you want to take that away from FUSAGE there just like we do for
>> > local calls
>> > (and in fact the code should already check flag_pic && flag_plt I
>> > suppose.
>>
>> Done that now and patch attached.
>>
>> Thanks
>> Sri
>>
>> >
>> > Honza
Ramana Radhakrishnan June 1, 2015, 8:24 a.m. UTC | #2
>> Why isn't it just an indirect call in the cases that would require a GOT
>> slot and a direct call otherwise ? I'm trying to work out what's so
>> different on each target that mandates this to be in the target backend.
>> Also it would be better to push the tests into gcc.dg if you can and check
>> for the absence of a relocation so that folks at least see these as being
>> UNSUPPORTED on their target.
>


To be even more explicit, shouldn't this be handled similar to the way 
in which -fno-plt is handled in a target agnostic manner ? After all, if 
you can handle this for the command line, doing the same for a function 
which has been decorated with attribute((noplt)) should be simple.

> I am not familiar with PLT calls for other targets.  I can move the
> tests to gcc.dg but what relocation are you suggesting I check for?

Move the test to gcc.dg, add a target_support_no_plt function in 
testsuite/lib/target-supports.exp and mark this as being supported only 
on x86 and use scan-assembler to scan for PLT relocations for x86. Other 
targets can add things as they deem fit.

In any case, on a large number of elf/ linux targets I would have 
thought the absence of a JMP_SLOT relocation would be good enough to 
check that this is working correctly.

regards
Ramana



>
> Thanks
> Sri
>
>
>>
>>
>>
>> Ramana
>>>
>>>>
>>>> Also I think the PLT calls have EBX in call fusage wich is added by
>>>> ix86_expand_call.
>>>>    else
>>>>      {
>>>>        /* Static functions and indirect calls don't need the pic
>>>> register.  */
>>>>        if (flag_pic
>>>>            && (!TARGET_64BIT
>>>>                || (ix86_cmodel == CM_LARGE_PIC
>>>>                    && DEFAULT_ABI != MS_ABI))
>>>>            && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF
>>>>            && ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0)))
>>>>          {
>>>>            use_reg (&use, gen_rtx_REG (Pmode,
>>>> REAL_PIC_OFFSET_TABLE_REGNUM));
>>>>            if (ix86_use_pseudo_pic_reg ())
>>>>              emit_move_insn (gen_rtx_REG (Pmode,
>>>> REAL_PIC_OFFSET_TABLE_REGNUM),
>>>>                              pic_offset_table_rtx);
>>>>          }
>>>>
>>>> I think you want to take that away from FUSAGE there just like we do for
>>>> local calls
>>>> (and in fact the code should already check flag_pic && flag_plt I
>>>> suppose.
>>>
>>> Done that now and patch attached.
>>>
>>> Thanks
>>> Sri
>>>
>>>>
>>>> Honza
>
Sriraman Tallam June 1, 2015, 6:01 p.m. UTC | #3
On Mon, Jun 1, 2015 at 1:24 AM, Ramana Radhakrishnan
<ramana.radhakrishnan@arm.com> wrote:
>
>>> Why isn't it just an indirect call in the cases that would require a GOT
>>> slot and a direct call otherwise ? I'm trying to work out what's so
>>> different on each target that mandates this to be in the target backend.
>>> Also it would be better to push the tests into gcc.dg if you can and
>>> check
>>> for the absence of a relocation so that folks at least see these as being
>>> UNSUPPORTED on their target.
>>
>>
>
>
> To be even more explicit, shouldn't this be handled similar to the way in
> which -fno-plt is handled in a target agnostic manner ? After all, if you
> can handle this for the command line, doing the same for a function which
> has been decorated with attribute((noplt)) should be simple.

-fno-plt does not work for non-PIC code, having non-PIC code not use
PLT was my primary motivation.  Infact, if you go back in this thread,
I suggested to HJ if I should piggyback on -fno-plt.  I tried using
the -fno-plt implementation to do this by removing the flag_pic check
in calls.c, but that does not still work for non-PIC code.

>
>> I am not familiar with PLT calls for other targets.  I can move the
>> tests to gcc.dg but what relocation are you suggesting I check for?
>
>
> Move the test to gcc.dg, add a target_support_no_plt function in
> testsuite/lib/target-supports.exp and mark this as being supported only on
> x86 and use scan-assembler to scan for PLT relocations for x86. Other
> targets can add things as they deem fit.

>
> In any case, on a large number of elf/ linux targets I would have thought
> the absence of a JMP_SLOT relocation would be good enough to check that this
> is working correctly.
>
> regards
> Ramana
>
>
>
>
>>
>> Thanks
>> Sri
>>
>>
>>>
>>>
>>>
>>> Ramana
>>>>
>>>>
>>>>>
>>>>> Also I think the PLT calls have EBX in call fusage wich is added by
>>>>> ix86_expand_call.
>>>>>    else
>>>>>      {
>>>>>        /* Static functions and indirect calls don't need the pic
>>>>> register.  */
>>>>>        if (flag_pic
>>>>>            && (!TARGET_64BIT
>>>>>                || (ix86_cmodel == CM_LARGE_PIC
>>>>>                    && DEFAULT_ABI != MS_ABI))
>>>>>            && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF
>>>>>            && ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0)))
>>>>>          {
>>>>>            use_reg (&use, gen_rtx_REG (Pmode,
>>>>> REAL_PIC_OFFSET_TABLE_REGNUM));
>>>>>            if (ix86_use_pseudo_pic_reg ())
>>>>>              emit_move_insn (gen_rtx_REG (Pmode,
>>>>> REAL_PIC_OFFSET_TABLE_REGNUM),
>>>>>                              pic_offset_table_rtx);
>>>>>          }
>>>>>
>>>>> I think you want to take that away from FUSAGE there just like we do
>>>>> for
>>>>> local calls
>>>>> (and in fact the code should already check flag_pic && flag_plt I
>>>>> suppose.
>>>>
>>>>
>>>> Done that now and patch attached.
>>>>
>>>> Thanks
>>>> Sri
>>>>
>>>>>
>>>>> Honza
>>
>>
>
Ramana Radhakrishnan June 1, 2015, 6:41 p.m. UTC | #4
On Mon, Jun 1, 2015 at 7:01 PM, Sriraman Tallam <tmsriram@google.com> wrote:
> On Mon, Jun 1, 2015 at 1:24 AM, Ramana Radhakrishnan
> <ramana.radhakrishnan@arm.com> wrote:
>>
>>>> Why isn't it just an indirect call in the cases that would require a GOT
>>>> slot and a direct call otherwise ? I'm trying to work out what's so
>>>> different on each target that mandates this to be in the target backend.
>>>> Also it would be better to push the tests into gcc.dg if you can and
>>>> check
>>>> for the absence of a relocation so that folks at least see these as being
>>>> UNSUPPORTED on their target.
>>>
>>>
>>
>>
>> To be even more explicit, shouldn't this be handled similar to the way in
>> which -fno-plt is handled in a target agnostic manner ? After all, if you
>> can handle this for the command line, doing the same for a function which
>> has been decorated with attribute((noplt)) should be simple.
>
> -fno-plt does not work for non-PIC code, having non-PIC code not use
> PLT was my primary motivation.  Infact, if you go back in this thread,
> I suggested to HJ if I should piggyback on -fno-plt.  I tried using
> the -fno-plt implementation to do this by removing the flag_pic check
> in calls.c, but that does not still work for non-PIC code.

You're missing my point, unless I'm missing something basic here - I
should have been even more explicit and said -fPIC was a given in all
this discussion.

calls.c:229 has

else if (flag_pic && !flag_plt && fndecl_or_type
           && TREE_CODE (fndecl_or_type) == FUNCTION_DECL
           && !targetm.binds_local_p (fndecl_or_type))

why can't we merge the check in here for the attribute noplt ?

If a new attribute is added to the "GNU language" in this case, why
isn't this being treated in the same way as the command line option
has been treated ? All this means is that we add an attribute and a
command line option to common code and then not implement it in a
proper target agnostic fashion.

regards
Ramana


>
>>
>>> I am not familiar with PLT calls for other targets.  I can move the
>>> tests to gcc.dg but what relocation are you suggesting I check for?
>>
>>
>> Move the test to gcc.dg, add a target_support_no_plt function in
>> testsuite/lib/target-supports.exp and mark this as being supported only on
>> x86 and use scan-assembler to scan for PLT relocations for x86. Other
>> targets can add things as they deem fit.
>
>>
>> In any case, on a large number of elf/ linux targets I would have thought
>> the absence of a JMP_SLOT relocation would be good enough to check that this
>> is working correctly.
>>
>> regards
>> Ramana
>>
>>
>>
>>
>>>
>>> Thanks
>>> Sri
>>>
>>>
>>>>
>>>>
>>>>
>>>> Ramana
>>>>>
>>>>>
>>>>>>
>>>>>> Also I think the PLT calls have EBX in call fusage wich is added by
>>>>>> ix86_expand_call.
>>>>>>    else
>>>>>>      {
>>>>>>        /* Static functions and indirect calls don't need the pic
>>>>>> register.  */
>>>>>>        if (flag_pic
>>>>>>            && (!TARGET_64BIT
>>>>>>                || (ix86_cmodel == CM_LARGE_PIC
>>>>>>                    && DEFAULT_ABI != MS_ABI))
>>>>>>            && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF
>>>>>>            && ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0)))
>>>>>>          {
>>>>>>            use_reg (&use, gen_rtx_REG (Pmode,
>>>>>> REAL_PIC_OFFSET_TABLE_REGNUM));
>>>>>>            if (ix86_use_pseudo_pic_reg ())
>>>>>>              emit_move_insn (gen_rtx_REG (Pmode,
>>>>>> REAL_PIC_OFFSET_TABLE_REGNUM),
>>>>>>                              pic_offset_table_rtx);
>>>>>>          }
>>>>>>
>>>>>> I think you want to take that away from FUSAGE there just like we do
>>>>>> for
>>>>>> local calls
>>>>>> (and in fact the code should already check flag_pic && flag_plt I
>>>>>> suppose.
>>>>>
>>>>>
>>>>> Done that now and patch attached.
>>>>>
>>>>> Thanks
>>>>> Sri
>>>>>
>>>>>>
>>>>>> Honza
>>>
>>>
>>
Sriraman Tallam June 1, 2015, 6:55 p.m. UTC | #5
On Mon, Jun 1, 2015 at 11:41 AM, Ramana Radhakrishnan
<ramana.gcc@googlemail.com> wrote:
> On Mon, Jun 1, 2015 at 7:01 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>> On Mon, Jun 1, 2015 at 1:24 AM, Ramana Radhakrishnan
>> <ramana.radhakrishnan@arm.com> wrote:
>>>
>>>>> Why isn't it just an indirect call in the cases that would require a GOT
>>>>> slot and a direct call otherwise ? I'm trying to work out what's so
>>>>> different on each target that mandates this to be in the target backend.
>>>>> Also it would be better to push the tests into gcc.dg if you can and
>>>>> check
>>>>> for the absence of a relocation so that folks at least see these as being
>>>>> UNSUPPORTED on their target.
>>>>
>>>>
>>>
>>>
>>> To be even more explicit, shouldn't this be handled similar to the way in
>>> which -fno-plt is handled in a target agnostic manner ? After all, if you
>>> can handle this for the command line, doing the same for a function which
>>> has been decorated with attribute((noplt)) should be simple.
>>
>> -fno-plt does not work for non-PIC code, having non-PIC code not use
>> PLT was my primary motivation.  Infact, if you go back in this thread,
>> I suggested to HJ if I should piggyback on -fno-plt.  I tried using
>> the -fno-plt implementation to do this by removing the flag_pic check
>> in calls.c, but that does not still work for non-PIC code.
>
> You're missing my point, unless I'm missing something basic here - I
> should have been even more explicit and said -fPIC was a given in all
> this discussion.
>
> calls.c:229 has
>
> else if (flag_pic && !flag_plt && fndecl_or_type
>            && TREE_CODE (fndecl_or_type) == FUNCTION_DECL
>            && !targetm.binds_local_p (fndecl_or_type))
>
> why can't we merge the check in here for the attribute noplt ?

We can and and please see this thread, that is the exact patch I proposed :
https://gcc.gnu.org/ml/gcc-patches/2015-05/msg02682.html

However, there was one caveat.  I want this working without -fPIC too.
non-PIC code also generates PLT calls and I want them eliminated.

>
> If a new attribute is added to the "GNU language" in this case, why
> isn't this being treated in the same way as the command line option
> has been treated ? All this means is that we add an attribute and a
> command line option to common code and then not implement it in a
> proper target agnostic fashion.

You are right.  This is the way I wanted it too but I also wanted the
attribute to work without PIC. PLT calls are generated without -fPIC
and -fPIE too and I wanted a solution for that.  On looking at the
code in more detail,

* -fno-plt is made to work with -fPIC, is there a reason to not make
it work for non-PIC code?  I can remove the flag_pic check from
calls.c
* Then, I add the generic attribute "noplt" and everything is fine.

There is just one caveat with the above approach, for x86_64
(*call_insn) will not generate indirect-calls for *non-PIC* code
because constant_call_address_operand in predicates.md will evaluate
to false.  This can be fixed appropriately in ix86_output_call_insn in
i386.c.


Is this alright?  Sorry for the confusion, but the primary reason why
I did not do it the way you suggested is because we wanted "noplt"
attribute to work for non-PIC code also.

Thanks
Sri

>
> regards
> Ramana
>
>
>>
>>>
>>>> I am not familiar with PLT calls for other targets.  I can move the
>>>> tests to gcc.dg but what relocation are you suggesting I check for?
>>>
>>>
>>> Move the test to gcc.dg, add a target_support_no_plt function in
>>> testsuite/lib/target-supports.exp and mark this as being supported only on
>>> x86 and use scan-assembler to scan for PLT relocations for x86. Other
>>> targets can add things as they deem fit.
>>
>>>
>>> In any case, on a large number of elf/ linux targets I would have thought
>>> the absence of a JMP_SLOT relocation would be good enough to check that this
>>> is working correctly.
>>>
>>> regards
>>> Ramana
>>>
>>>
>>>
>>>
>>>>
>>>> Thanks
>>>> Sri
>>>>
>>>>
>>>>>
>>>>>
>>>>>
>>>>> Ramana
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Also I think the PLT calls have EBX in call fusage wich is added by
>>>>>>> ix86_expand_call.
>>>>>>>    else
>>>>>>>      {
>>>>>>>        /* Static functions and indirect calls don't need the pic
>>>>>>> register.  */
>>>>>>>        if (flag_pic
>>>>>>>            && (!TARGET_64BIT
>>>>>>>                || (ix86_cmodel == CM_LARGE_PIC
>>>>>>>                    && DEFAULT_ABI != MS_ABI))
>>>>>>>            && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF
>>>>>>>            && ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0)))
>>>>>>>          {
>>>>>>>            use_reg (&use, gen_rtx_REG (Pmode,
>>>>>>> REAL_PIC_OFFSET_TABLE_REGNUM));
>>>>>>>            if (ix86_use_pseudo_pic_reg ())
>>>>>>>              emit_move_insn (gen_rtx_REG (Pmode,
>>>>>>> REAL_PIC_OFFSET_TABLE_REGNUM),
>>>>>>>                              pic_offset_table_rtx);
>>>>>>>          }
>>>>>>>
>>>>>>> I think you want to take that away from FUSAGE there just like we do
>>>>>>> for
>>>>>>> local calls
>>>>>>> (and in fact the code should already check flag_pic && flag_plt I
>>>>>>> suppose.
>>>>>>
>>>>>>
>>>>>> Done that now and patch attached.
>>>>>>
>>>>>> Thanks
>>>>>> Sri
>>>>>>
>>>>>>>
>>>>>>> Honza
>>>>
>>>>
>>>
Ramana Radhakrishnan June 1, 2015, 8:33 p.m. UTC | #6
On Mon, Jun 1, 2015 at 7:55 PM, Sriraman Tallam <tmsriram@google.com> wrote:
> On Mon, Jun 1, 2015 at 11:41 AM, Ramana Radhakrishnan
> <ramana.gcc@googlemail.com> wrote:
>> On Mon, Jun 1, 2015 at 7:01 PM, Sriraman Tallam <tmsriram@google.com> wrote:
>>> On Mon, Jun 1, 2015 at 1:24 AM, Ramana Radhakrishnan
>>> <ramana.radhakrishnan@arm.com> wrote:
>>>>
>>>>>> Why isn't it just an indirect call in the cases that would require a GOT
>>>>>> slot and a direct call otherwise ? I'm trying to work out what's so
>>>>>> different on each target that mandates this to be in the target backend.
>>>>>> Also it would be better to push the tests into gcc.dg if you can and
>>>>>> check
>>>>>> for the absence of a relocation so that folks at least see these as being
>>>>>> UNSUPPORTED on their target.
>>>>>
>>>>>
>>>>
>>>>
>>>> To be even more explicit, shouldn't this be handled similar to the way in
>>>> which -fno-plt is handled in a target agnostic manner ? After all, if you
>>>> can handle this for the command line, doing the same for a function which
>>>> has been decorated with attribute((noplt)) should be simple.
>>>
>>> -fno-plt does not work for non-PIC code, having non-PIC code not use
>>> PLT was my primary motivation.  Infact, if you go back in this thread,
>>> I suggested to HJ if I should piggyback on -fno-plt.  I tried using
>>> the -fno-plt implementation to do this by removing the flag_pic check
>>> in calls.c, but that does not still work for non-PIC code.

If you want __attribute__ ((noplt)) to work for non-PIC code, we
should look to code it in the same place surely by making all
__attribute__((noplt)) calls, indirect calls irrespective of whether
it's fpic or not.


>>
>> You're missing my point, unless I'm missing something basic here - I
>> should have been even more explicit and said -fPIC was a given in all
>> this discussion.
>>
>> calls.c:229 has
>>
>> else if (flag_pic && !flag_plt && fndecl_or_type
>>            && TREE_CODE (fndecl_or_type) == FUNCTION_DECL
>>            && !targetm.binds_local_p (fndecl_or_type))
>>
>> why can't we merge the check in here for the attribute noplt ?
>
> We can and and please see this thread, that is the exact patch I proposed :
> https://gcc.gnu.org/ml/gcc-patches/2015-05/msg02682.html
>
> However, there was one caveat.  I want this working without -fPIC too.
> non-PIC code also generates PLT calls and I want them eliminated.
>
>>
>> If a new attribute is added to the "GNU language" in this case, why
>> isn't this being treated in the same way as the command line option
>> has been treated ? All this means is that we add an attribute and a
>> command line option to common code and then not implement it in a
>> proper target agnostic fashion.
>
> You are right.  This is the way I wanted it too but I also wanted the
> attribute to work without PIC. PLT calls are generated without -fPIC
> and -fPIE too and I wanted a solution for that.  On looking at the
> code in more detail,
>
> * -fno-plt is made to work with -fPIC, is there a reason to not make
> it work for non-PIC code?  I can remove the flag_pic check from
> calls.c

I don't think that's right, you probably have to allow that along with
(flag_pic || (decl && attribute_no_plt (decl)) - however it seems odd
to me that the language extension allows this but the flag doesn't.

> * Then, I add the generic attribute "noplt" and everything is fine.
>
> There is just one caveat with the above approach, for x86_64
> (*call_insn) will not generate indirect-calls for *non-PIC* code
> because constant_call_address_operand in predicates.md will evaluate
> to false.  This can be fixed appropriately in ix86_output_call_insn in
> i386.c.

Yes, targets need to massage that into place but that's essentially
the mechanics of retaining indirect calls in each backend. -fno-plt
doesn't work for ARM / AArch64 with optimizers currently (and I
suspect on most other targets) because our predicates are too liberal,
fixed by treating "noplt" or -fno-plt as the equivalent of
-mlong-calls.

>
>
> Is this alright?  Sorry for the confusion, but the primary reason why
> I did not do it the way you suggested is because we wanted "noplt"
> attribute to work for non-PIC code also.

If that is the case, then this is a slightly more complicated
condition in the same place. We then always have indirect calls for
functions that are marked noplt and just have target generate this
appropriately.

To be honest, this is trivial to implement in the ARM backend as one
would just piggy back on the longcalls work - despite that, IMNSHO
it's best done in a target independent manner.

regards
Ramana

>
> Thanks
> Sri
>
>>
>> regards
>> Ramana
>>
>>
>>>
>>>>
>>>>> I am not familiar with PLT calls for other targets.  I can move the
>>>>> tests to gcc.dg but what relocation are you suggesting I check for?
>>>>
>>>>
>>>> Move the test to gcc.dg, add a target_support_no_plt function in
>>>> testsuite/lib/target-supports.exp and mark this as being supported only on
>>>> x86 and use scan-assembler to scan for PLT relocations for x86. Other
>>>> targets can add things as they deem fit.
>>>
>>>>
>>>> In any case, on a large number of elf/ linux targets I would have thought
>>>> the absence of a JMP_SLOT relocation would be good enough to check that this
>>>> is working correctly.
>>>>
>>>> regards
>>>> Ramana
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>> Thanks
>>>>> Sri
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Ramana
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Also I think the PLT calls have EBX in call fusage wich is added by
>>>>>>>> ix86_expand_call.
>>>>>>>>    else
>>>>>>>>      {
>>>>>>>>        /* Static functions and indirect calls don't need the pic
>>>>>>>> register.  */
>>>>>>>>        if (flag_pic
>>>>>>>>            && (!TARGET_64BIT
>>>>>>>>                || (ix86_cmodel == CM_LARGE_PIC
>>>>>>>>                    && DEFAULT_ABI != MS_ABI))
>>>>>>>>            && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF
>>>>>>>>            && ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0)))
>>>>>>>>          {
>>>>>>>>            use_reg (&use, gen_rtx_REG (Pmode,
>>>>>>>> REAL_PIC_OFFSET_TABLE_REGNUM));
>>>>>>>>            if (ix86_use_pseudo_pic_reg ())
>>>>>>>>              emit_move_insn (gen_rtx_REG (Pmode,
>>>>>>>> REAL_PIC_OFFSET_TABLE_REGNUM),
>>>>>>>>                              pic_offset_table_rtx);
>>>>>>>>          }
>>>>>>>>
>>>>>>>> I think you want to take that away from FUSAGE there just like we do
>>>>>>>> for
>>>>>>>> local calls
>>>>>>>> (and in fact the code should already check flag_pic && flag_plt I
>>>>>>>> suppose.
>>>>>>>
>>>>>>>
>>>>>>> Done that now and patch attached.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Sri
>>>>>>>
>>>>>>>>
>>>>>>>> Honza
>>>>>
>>>>>
>>>>
diff mbox

Patch

Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c	(revision 223720)
+++ config/i386/i386.c	(working copy)
@@ -25475,6 +25475,28 @@  construct_plt_address (rtx symbol)
   return tmp;
 }
 
+/* Return true if the function being called was marked with attribute
+   "noplt".  If this function is defined, this should return false.  This
+   is currently used only with 64-bit ELF targets.  */
+static bool
+avoid_plt_to_call (rtx call_op)
+{
+  if (!TARGET_64BIT || TARGET_MACHO|| TARGET_SEH || TARGET_PECOFF)
+    return false;
+
+  if (SYMBOL_REF_LOCAL_P (call_op))
+    return false;
+
+  tree symbol_decl = SYMBOL_REF_DECL (call_op);
+
+  if (symbol_decl != NULL_TREE
+      && TREE_CODE (symbol_decl) == FUNCTION_DECL
+      && lookup_attribute ("noplt", DECL_ATTRIBUTES (symbol_decl)))
+    return true;
+
+  return false;
+}
+
 rtx
 ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1,
 		  rtx callarg2,
@@ -25497,13 +25519,16 @@  ix86_expand_call (rtx retval, rtx fnaddr, rtx call
     }
   else
     {
-      /* Static functions and indirect calls don't need the pic register.  */
+      /* Static functions and indirect calls don't need the pic register.  Also,
+	 check if PLT was explicitly avoided via no-plt or "noplt" attribute, making
+	 it an indirect call.  */
       if (flag_pic
 	  && (!TARGET_64BIT
 	      || (ix86_cmodel == CM_LARGE_PIC
 		  && DEFAULT_ABI != MS_ABI))
 	  && GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF
-	  && ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0)))
+	  && ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0))
+	  && flag_plt && !avoid_plt_to_call (XEXP (fnaddr, 0)))
 	{
 	  use_reg (&use, gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM));
 	  if (ix86_use_pseudo_pic_reg ())
@@ -25611,7 +25636,13 @@  ix86_output_call_insn (rtx_insn *insn, rtx call_op
   if (SIBLING_CALL_P (insn))
     {
       if (direct_p)
-	xasm = "%!jmp\t%P0";
+	{
+	  if (!TARGET_MACHO && !TARGET_SEH && !TARGET_PECOFF
+	      && TARGET_64BIT && avoid_plt_to_call (call_op))
+	    xasm = "%!jmp\t*%p0@GOTPCREL(%%rip)";
+	  else
+	    xasm = "%!jmp\t%P0";
+	}
       /* SEH epilogue detection requires the indirect branch case
 	 to include REX.W.  */
       else if (TARGET_SEH)
@@ -25654,7 +25685,13 @@  ix86_output_call_insn (rtx_insn *insn, rtx call_op
     }
 
   if (direct_p)
-    xasm = "%!call\t%P0";
+    {
+      if (!TARGET_MACHO && !TARGET_SEH && !TARGET_PECOFF
+	  && TARGET_64BIT && avoid_plt_to_call (call_op))
+        xasm = "%!call\t*%p0@GOTPCREL(%%rip)";
+      else
+        xasm = "%!call\t%P0";
+    }
   else
     xasm = "%!call\t%A0";
 
@@ -46628,6 +46665,9 @@  static const struct attribute_spec ix86_attribute_
     false },
   { "callee_pop_aggregate_return", 1, 1, false, true, true,
     ix86_handle_callee_pop_aggregate_return, true },
+  /* Attribute to avoid calling function via PLT.  */
+  { "noplt", 0, 0, true, false, false, ix86_handle_fndecl_attribute,
+    false },
   /* End element.  */
   { NULL,        0, 0, false, false, false, NULL, false }
 };
Index: doc/extend.texi
===================================================================
--- doc/extend.texi	(revision 223720)
+++ doc/extend.texi	(working copy)
@@ -4858,6 +4858,13 @@  On x86-32 targets, the @code{stdcall} attribute ca
 assume that the called function pops off the stack space used to
 pass arguments, unless it takes a variable number of arguments.
 
+@item noplt
+@cindex @code{noplt} function attribute, x86-64
+@cindex functions whose calls do not go via PLT
+On x86-64 targets. the @code{noplt} attribute causes the compiler to
+call this external function indirectly using a GOT entry and avoid the
+PLT.
+
 @item target (@var{options})
 @cindex @code{target} function attribute
 As discussed in @ref{Common Function Attributes}, this attribute 
Index: testsuite/gcc.target/i386/noplt-1.c
===================================================================
--- testsuite/gcc.target/i386/noplt-1.c	(revision 0)
+++ testsuite/gcc.target/i386/noplt-1.c	(working copy)
@@ -0,0 +1,13 @@ 
+/* { dg-do compile { target x86_64-*-linux* } } */
+
+
+__attribute__ ((noplt))
+void foo();
+
+int main()
+{
+  foo();
+  return 0;
+}
+
+/* { dg-final { scan-assembler "call\[ \t\]\\*.*foo.*@GOTPCREL\\(%rip\\)" } } */ 
Index: testsuite/gcc.target/i386/noplt-2.c
===================================================================
--- testsuite/gcc.target/i386/noplt-2.c	(revision 0)
+++ testsuite/gcc.target/i386/noplt-2.c	(working copy)
@@ -0,0 +1,13 @@ 
+/* { dg-do compile { target x86_64-*-linux* } } */
+/* { dg-options "-O2" } */
+
+
+__attribute__ ((noplt))
+int foo();
+
+int main()
+{
+  return foo();
+}
+
+/* { dg-final { scan-assembler "jmp\[ \t\]\\*.*foo.*@GOTPCREL\\(%rip\\)" } } */