Message ID | 1430757479-14241-6-git-send-email-amonakov@ispras.ru |
---|---|
State | New |
Headers | show |
On 05/04/2015 10:37 AM, Alexander Monakov wrote: > This patch introduces option -fno-plt that allows to expand calls that would > go via PLT to load the address of the function immediately at call site (which > introduces a GOT load). Cover letter explains the motivation for this patch. > > New option documentation for invoke.texi is missing from the patch; if this is > accepted I'll be happy to send a v2 with documentation added. > > * calls.c (prepare_call_address): Transform PLT call to GOT lookup and > indirect call by forcing address into a pseudo with -fno-plt. > * common.opt (flag_plt): New option. OK once you cobble together the invoke.texi changes. Jeff
On Mon, May 04, 2015 at 11:34:05AM -0600, Jeff Law wrote: > On 05/04/2015 10:37 AM, Alexander Monakov wrote: > >This patch introduces option -fno-plt that allows to expand calls that would > >go via PLT to load the address of the function immediately at call site (which > >introduces a GOT load). Cover letter explains the motivation for this patch. > > > >New option documentation for invoke.texi is missing from the patch; if this is > >accepted I'll be happy to send a v2 with documentation added. > > > > * calls.c (prepare_call_address): Transform PLT call to GOT lookup and > > indirect call by forcing address into a pseudo with -fno-plt. > > * common.opt (flag_plt): New option. > OK once you cobble together the invoke.texi changes. Isn't what Michael/Alan suggested better? I mean as/ld/compiler changes to inline the plt slot's first part, then lazy binding will work fine. Jakub
On 05/04/2015 11:39 AM, Jakub Jelinek wrote: > On Mon, May 04, 2015 at 11:34:05AM -0600, Jeff Law wrote: >> On 05/04/2015 10:37 AM, Alexander Monakov wrote: >>> This patch introduces option -fno-plt that allows to expand calls that would >>> go via PLT to load the address of the function immediately at call site (which >>> introduces a GOT load). Cover letter explains the motivation for this patch. >>> >>> New option documentation for invoke.texi is missing from the patch; if this is >>> accepted I'll be happy to send a v2 with documentation added. >>> >>> * calls.c (prepare_call_address): Transform PLT call to GOT lookup and >>> indirect call by forcing address into a pseudo with -fno-plt. >>> * common.opt (flag_plt): New option. >> OK once you cobble together the invoke.texi changes. > > Isn't what Michael/Alan suggested better? I mean as/ld/compiler changes to > inline the plt slot's first part, then lazy binding will work fine. I must have missed Alan/Michael's message. ISTM the win here is that by going through the GOT, you can CSE the GOT reference and possibly get some more register allocation freedom. Is that still the case with Alan/Michael's approach? jeff
On Mon, May 04, 2015 at 11:42:20AM -0600, Jeff Law wrote: > On 05/04/2015 11:39 AM, Jakub Jelinek wrote: > >On Mon, May 04, 2015 at 11:34:05AM -0600, Jeff Law wrote: > >>On 05/04/2015 10:37 AM, Alexander Monakov wrote: > >>>This patch introduces option -fno-plt that allows to expand calls that would > >>>go via PLT to load the address of the function immediately at call site (which > >>>introduces a GOT load). Cover letter explains the motivation for this patch. > >>> > >>>New option documentation for invoke.texi is missing from the patch; if this is > >>>accepted I'll be happy to send a v2 with documentation added. > >>> > >>> * calls.c (prepare_call_address): Transform PLT call to GOT lookup and > >>> indirect call by forcing address into a pseudo with -fno-plt. > >>> * common.opt (flag_plt): New option. > >>OK once you cobble together the invoke.texi changes. > > > >Isn't what Michael/Alan suggested better? I mean as/ld/compiler changes to > >inline the plt slot's first part, then lazy binding will work fine. > I must have missed Alan/Michael's message. > > ISTM the win here is that by going through the GOT, you can CSE the > GOT reference and possibly get some more register allocation > freedom. Is that still the case with Alan/Michael's approach? There are many advantages to 'going through the GOT'. CSE'ing the reference is just one. The biggest (IMO) is that you can avoid the bad PLT ABI that most targets have, where making a call to a PLT slot requires the GOT address to be pre-loaded into a fixed, call-saved register. This precludes sibcalls and forces many functions which otherwise would not need their own stack frames to create one for saving the old value of the GOT register. See my blog entry on the topic here: http://ewontfix.com/18/ Anyone who really wants lazy binding can use -fplt (which is presumably still the default; I didn't check) but lazy binding should largely be considered deprecated anyway since effective use of relro protection requires -z now too, in which case you're paying all the costs (which are considerable!) for lazy binding support even though you won't get it. Rich
> This patch introduces option -fno-plt that allows to expand calls that would > go via PLT to load the address of the function immediately at call site (which > introduces a GOT load). Cover letter explains the motivation for this patch. > > New option documentation for invoke.texi is missing from the patch; if this is > accepted I'll be happy to send a v2 with documentation added. > > * calls.c (prepare_call_address): Transform PLT call to GOT lookup and > indirect call by forcing address into a pseudo with -fno-plt. > * common.opt (flag_plt): New option. > > diff --git a/gcc/common.opt b/gcc/common.opt > index b49ac46..cd8b256 100644 > --- a/gcc/common.opt > +++ b/gcc/common.opt > @@ -1773,12 +1773,16 @@ Common Report Var(flag_pic,1) Negative(fpie) > Generate position-independent code if possible (small mode) > > fpie > Common Report Var(flag_pie,1) Negative(fPIC) > Generate position-independent code for executables if possible (small mode) > > +fplt > +Common Report Var(flag_plt) Init(1) > +Use PLT for PIC calls (-fno-plt: load the address from GOT at call site) > + This won't play well with LTO since fplt will become another global flag while it affects codegen. I still did not catch up with the other thread and Hj's work on doing this transparently in linker, but if this is getting in, please add Optimization to fplt, so the PLT usage can be decided with per function granuality. Honza
> On Mon, May 04, 2015 at 11:42:20AM -0600, Jeff Law wrote: > > On 05/04/2015 11:39 AM, Jakub Jelinek wrote: > > >On Mon, May 04, 2015 at 11:34:05AM -0600, Jeff Law wrote: > > >>On 05/04/2015 10:37 AM, Alexander Monakov wrote: > > >>>This patch introduces option -fno-plt that allows to expand calls that would > > >>>go via PLT to load the address of the function immediately at call site (which > > >>>introduces a GOT load). Cover letter explains the motivation for this patch. > > >>> > > >>>New option documentation for invoke.texi is missing from the patch; if this is > > >>>accepted I'll be happy to send a v2 with documentation added. > > >>> > > >>> * calls.c (prepare_call_address): Transform PLT call to GOT lookup and > > >>> indirect call by forcing address into a pseudo with -fno-plt. > > >>> * common.opt (flag_plt): New option. > > >>OK once you cobble together the invoke.texi changes. > > > > > >Isn't what Michael/Alan suggested better? I mean as/ld/compiler changes to > > >inline the plt slot's first part, then lazy binding will work fine. > > I must have missed Alan/Michael's message. > > > > ISTM the win here is that by going through the GOT, you can CSE the > > GOT reference and possibly get some more register allocation > > freedom. Is that still the case with Alan/Michael's approach? > > There are many advantages to 'going through the GOT'. CSE'ing the > reference is just one. The biggest (IMO) is that you can avoid the bad > PLT ABI that most targets have, where making a call to a PLT slot > requires the GOT address to be pre-loaded into a fixed, call-saved > register. This precludes sibcalls and forces many functions which > otherwise would not need their own stack frames to create one for > saving the old value of the GOT register. See my blog entry on the > topic here: http://ewontfix.com/18/ One common pattern I noticed while looking at codegen for speculative devirtualization is that in case we do not inline the virtual call we end up with if (ptr = &foo) foo() which leads to both GOT lookup to figure out address of foo and call across PLT. It would be nice to handle this gratefully. Note that one of improvements I want to do to devirt machinery is to change the code seuqence to: if (vptr == &expected_vtable) foo () else vptr[token](); To saven the vtable lookup. But this is not possible in all cases - it happens that there are multiple predicted vtables all agreeeing on the partiuclar slot. Honza > > Anyone who really wants lazy binding can use -fplt (which is > presumably still the default; I didn't check) but lazy binding should > largely be considered deprecated anyway since effective use of relro > protection requires -z now too, in which case you're paying all the > costs (which are considerable!) for lazy binding support even though > you won't get it. > > Rich
On 05/10/2015 10:59 AM, Jan Hubicka wrote: >> This patch introduces option -fno-plt that allows to expand calls that would >> go via PLT to load the address of the function immediately at call site (which >> introduces a GOT load). Cover letter explains the motivation for this patch. >> >> New option documentation for invoke.texi is missing from the patch; if this is >> accepted I'll be happy to send a v2 with documentation added. >> >> * calls.c (prepare_call_address): Transform PLT call to GOT lookup and >> indirect call by forcing address into a pseudo with -fno-plt. >> * common.opt (flag_plt): New option. >> >> diff --git a/gcc/common.opt b/gcc/common.opt >> index b49ac46..cd8b256 100644 >> --- a/gcc/common.opt >> +++ b/gcc/common.opt >> @@ -1773,12 +1773,16 @@ Common Report Var(flag_pic,1) Negative(fpie) >> Generate position-independent code if possible (small mode) >> >> fpie >> Common Report Var(flag_pie,1) Negative(fPIC) >> Generate position-independent code for executables if possible (small mode) >> >> +fplt >> +Common Report Var(flag_plt) Init(1) >> +Use PLT for PIC calls (-fno-plt: load the address from GOT at call site) >> + > > This won't play well with LTO since fplt will become another global flag while > it affects codegen. I know Richi explained this to me in the past, but I can't remember the details of why this is bad. Can you walk me through it again? jeff
On Mon, May 11, 2015 at 1:36 PM, Jeff Law <law@redhat.com> wrote: > On 05/10/2015 10:59 AM, Jan Hubicka wrote: >>> >>> This patch introduces option -fno-plt that allows to expand calls that >>> would >>> go via PLT to load the address of the function immediately at call site >>> (which >>> introduces a GOT load). Cover letter explains the motivation for this >>> patch. >>> >>> New option documentation for invoke.texi is missing from the patch; if >>> this is >>> accepted I'll be happy to send a v2 with documentation added. >>> >>> * calls.c (prepare_call_address): Transform PLT call to GOT >>> lookup and >>> indirect call by forcing address into a pseudo with -fno-plt. >>> * common.opt (flag_plt): New option. >>> >>> diff --git a/gcc/common.opt b/gcc/common.opt >>> index b49ac46..cd8b256 100644 >>> --- a/gcc/common.opt >>> +++ b/gcc/common.opt >>> @@ -1773,12 +1773,16 @@ Common Report Var(flag_pic,1) Negative(fpie) >>> Generate position-independent code if possible (small mode) >>> >>> fpie >>> Common Report Var(flag_pie,1) Negative(fPIC) >>> Generate position-independent code for executables if possible (small >>> mode) >>> >>> +fplt >>> +Common Report Var(flag_plt) Init(1) >>> +Use PLT for PIC calls (-fno-plt: load the address from GOT at call site) >>> + >> >> >> This won't play well with LTO since fplt will become another global flag >> while >> it affects codegen. > > I know Richi explained this to me in the past, but I can't remember the > details of why this is bad. Can you walk me through it again? > I have proposed a different approach: https://gcc.gnu.org/ml/gcc/2015-05/msg00086.html
> >> This won't play well with LTO since fplt will become another global flag > >> while > >> it affects codegen. > > > > I know Richi explained this to me in the past, but I can't remember the > > details of why this is bad. Can you walk me through it again? > > > > I have proposed a different approach: > > https://gcc.gnu.org/ml/gcc/2015-05/msg00086.html THe RELAX_PC* approach looks indeed interesting (I still need to catch up with the thread), but to answer Jeff's question. With LTO we need to handle stuff like gcc a.c -fplt -flto -c -O2 gcc b.c -fno-plt -flto -c -Os gcc a.o b.o -flto and generaly we would like to mimmic as closely as possible what happens with non-LTO builds. That is functions originating from a.c should be -O2 optimized with PLT and functions from b.c should size optimized w/o PLT (wich makes cross module inlining fun). To do so we now attach implicit optimization/target node to each function that stores the flags used to build the unit. optimization nodes contains only those flags that are defined as Optimization. So in general if we have a flag that is about function codegen and we are able to produce function with different values of the flag in one unit, we really want to mark it as Optimization (and decide what we want to do about inlining across the flag boundary). Not all flags works like this, for example -fPIC is a global flag and then there is Richi's code in lto-wrapper that reados those options from all .o files first and somehow chose the prevailing one for the whole program. In longer term we want to eliminate as many as possible of those global flags (for exmaple -m32 can stay global as you can not mix it with -m64) and also to explicitely represent some of the flags in IL, so inlining across boundaries works as expected. Honza > > > -- > H.J.
On 04/05/15 17:37, Alexander Monakov wrote: > This patch introduces option -fno-plt that allows to expand calls that would > go via PLT to load the address of the function immediately at call site (which > introduces a GOT load). Cover letter explains the motivation for this patch. > > New option documentation for invoke.texi is missing from the patch; if this is > accepted I'll be happy to send a v2 with documentation added. > > * calls.c (prepare_call_address): Transform PLT call to GOT lookup and > indirect call by forcing address into a pseudo with -fno-plt. > * common.opt (flag_plt): New option. Have done a quick experiment, -fno-plt doesn't work on AArch64. it's because although this patch force the function address into register, but the combine pass runs later combine it back as AArch64 have defined such insn pattern. For X86, it's not combined back. From the rtl dump, it's because the rtl pre pass has moved the address load instruction into another basic block and combine pass don't combine across basic blocks. Also, x86 backend has done some check on flag_plt in the new added ix86_nopic_noplt_attribute_p which could help generate correct insns. What I can think of the fix on AArch64 is by restricting the call symbol under "flag_plt == true" only, so that call via register can't be combined into call symbol direct, Or better to prohibit combine pass for such combining? as the generic fix on combine may fix other broken targets. Thoughts? Regards, Jiong
diff --git a/gcc/calls.c b/gcc/calls.c index 970415d..0c3b9aa 100644 --- a/gcc/calls.c +++ b/gcc/calls.c @@ -222,12 +222,18 @@ prepare_call_address (tree fndecl_or_type, rtx funexp, rtx static_chain_value, /* If we are using registers for parameters, force the function address into a register now. */ funexp = ((reg_parm_seen && targetm.small_register_classes_for_mode_p (FUNCTION_MODE)) ? force_not_mem (memory_address (FUNCTION_MODE, funexp)) : memory_address (FUNCTION_MODE, funexp)); + else if (flag_pic && !flag_plt && fndecl_or_type + && TREE_CODE (fndecl_or_type) == FUNCTION_DECL + && !targetm.binds_local_p (fndecl_or_type)) + { + funexp = force_reg (Pmode, funexp); + } else if (! sibcallp) { #ifndef NO_FUNCTION_CSE if (optimize && ! flag_no_function_cse) funexp = force_reg (Pmode, funexp); #endif diff --git a/gcc/common.opt b/gcc/common.opt index b49ac46..cd8b256 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -1773,12 +1773,16 @@ Common Report Var(flag_pic,1) Negative(fpie) Generate position-independent code if possible (small mode) fpie Common Report Var(flag_pie,1) Negative(fPIC) Generate position-independent code for executables if possible (small mode) +fplt +Common Report Var(flag_plt) Init(1) +Use PLT for PIC calls (-fno-plt: load the address from GOT at call site) + fplugin= Common Joined RejectNegative Var(common_deferred_options) Defer Specify a plugin to load fplugin-arg- Common Joined RejectNegative Var(common_deferred_options) Defer