Message ID | 20150319235208.GB26234@bubble.grove.modra.org |
---|---|
State | New |
Headers | show |
On Fri, Mar 20, 2015 at 10:22:08AM +1030, Alan Modra wrote: > IFUNC is difficult to correctly implement on any target needing a GOT > to support position independent code, due to the dependency on order > of dynamic relocations. ld.so should be changed to apply IFUNC > relocations last, globally, because without that it is actually > impossible to write an IFUNC resolver in C that works in all > situations. There are still all sorts of issues with IFUNC resolvers in terms of what they can safely do. I appreciate efforts to improve the current practical situation, but I don't think we can lay the issue to rest until there's a proper specification for what IFUNC resolvers can do. For example even with your ordering requirement, things would still break if one IFUNC resolver ends up referencing another function that happens to be resolved via IFUNC. My preference would be to specify that the IFUNC resolver cannot access any external-linkage symbols (except possibly with hidden visibility) or objects with static storage duration. This would be suitable for resolvers which can obtain a result purely based on cpu probing (in inline asm), but would not work if AT_HWCAP or similar is needed. A slightly less restrictive option would be to permit access to a (possibly target-specific) list of external symbols/static objects. But I don't know whether this would be sufficient to meet the needs of existing IFUNC users. This is really a huge mess -- the fact that a feature like this was allowed to get "out in the wild" with no clear contract for how it works, leaving us stuck with a mess of difficult-to-meet implementation constraints that aren't even clear. Rich
On 03/20/2015 11:22 AM, Rich Felker wrote: > On Fri, Mar 20, 2015 at 10:22:08AM +1030, Alan Modra wrote: >> IFUNC is difficult to correctly implement on any target needing a GOT >> to support position independent code, due to the dependency on order >> of dynamic relocations. ld.so should be changed to apply IFUNC >> relocations last, globally, because without that it is actually >> impossible to write an IFUNC resolver in C that works in all >> situations. > > There are still all sorts of issues with IFUNC resolvers in terms of > what they can safely do. I appreciate efforts to improve the current > practical situation, but I don't think we can lay the issue to rest > until there's a proper specification for what IFUNC resolvers can do. > For example even with your ordering requirement, things would still > break if one IFUNC resolver ends up referencing another function that > happens to be resolved via IFUNC. > > My preference would be to specify that the IFUNC resolver cannot > access any external-linkage symbols (except possibly with hidden > visibility) or objects with static storage duration. This would be > suitable for resolvers which can obtain a result purely based on cpu > probing (in inline asm), but would not work if AT_HWCAP or similar is > needed. A slightly less restrictive option would be to permit access > to a (possibly target-specific) list of external symbols/static > objects. But I don't know whether this would be sufficient to meet the > needs of existing IFUNC users. > > This is really a huge mess -- the fact that a feature like this was > allowed to get "out in the wild" with no clear contract for how it > works, leaving us stuck with a mess of difficult-to-meet > implementation constraints that aren't even clear. In complete agreement. Unfortunately no core developer wants to step up and fix this, and thus it remains a poorly understood and difficult to use feature. Cheers, Carlos.
On Fri, Mar 20, 2015 at 8:51 AM, Carlos O'Donell <carlos@redhat.com> wrote: > On 03/20/2015 11:22 AM, Rich Felker wrote: >> On Fri, Mar 20, 2015 at 10:22:08AM +1030, Alan Modra wrote: >>> IFUNC is difficult to correctly implement on any target needing a GOT >>> to support position independent code, due to the dependency on order >>> of dynamic relocations. ld.so should be changed to apply IFUNC >>> relocations last, globally, because without that it is actually >>> impossible to write an IFUNC resolver in C that works in all >>> situations. >> >> There are still all sorts of issues with IFUNC resolvers in terms of >> what they can safely do. I appreciate efforts to improve the current >> practical situation, but I don't think we can lay the issue to rest >> until there's a proper specification for what IFUNC resolvers can do. >> For example even with your ordering requirement, things would still >> break if one IFUNC resolver ends up referencing another function that >> happens to be resolved via IFUNC. >> >> My preference would be to specify that the IFUNC resolver cannot >> access any external-linkage symbols (except possibly with hidden >> visibility) or objects with static storage duration. This would be >> suitable for resolvers which can obtain a result purely based on cpu >> probing (in inline asm), but would not work if AT_HWCAP or similar is >> needed. A slightly less restrictive option would be to permit access >> to a (possibly target-specific) list of external symbols/static >> objects. But I don't know whether this would be sufficient to meet the >> needs of existing IFUNC users. >> >> This is really a huge mess -- the fact that a feature like this was >> allowed to get "out in the wild" with no clear contract for how it >> works, leaving us stuck with a mess of difficult-to-meet >> implementation constraints that aren't even clear. > > In complete agreement. > > Unfortunately no core developer wants to step up and fix this, and > thus it remains a poorly understood and difficult to use feature. > IFUNC is processor specific. I have some old docs for x86 IFUNC: https://sites.google.com/site/x32abi/documents/ifunc.txt?attredirects=0&d=1 I can update IFUNC spec in x86 and x86-64 psABIs. Any feedbacks are more than welcome. You can send your comments to https://groups.google.com/forum/#!forum/x86-64-abi Thanks.
On 20 Mar 2015 08:58, H.J. Lu wrote: > On Fri, Mar 20, 2015 at 8:51 AM, Carlos O'Donell wrote: > > On 03/20/2015 11:22 AM, Rich Felker wrote: > >> On Fri, Mar 20, 2015 at 10:22:08AM +1030, Alan Modra wrote: > >>> IFUNC is difficult to correctly implement on any target needing a GOT > >>> to support position independent code, due to the dependency on order > >>> of dynamic relocations. ld.so should be changed to apply IFUNC > >>> relocations last, globally, because without that it is actually > >>> impossible to write an IFUNC resolver in C that works in all > >>> situations. > >> > >> There are still all sorts of issues with IFUNC resolvers in terms of > >> what they can safely do. I appreciate efforts to improve the current > >> practical situation, but I don't think we can lay the issue to rest > >> until there's a proper specification for what IFUNC resolvers can do. > >> For example even with your ordering requirement, things would still > >> break if one IFUNC resolver ends up referencing another function that > >> happens to be resolved via IFUNC. > >> > >> My preference would be to specify that the IFUNC resolver cannot > >> access any external-linkage symbols (except possibly with hidden > >> visibility) or objects with static storage duration. This would be > >> suitable for resolvers which can obtain a result purely based on cpu > >> probing (in inline asm), but would not work if AT_HWCAP or similar is > >> needed. A slightly less restrictive option would be to permit access > >> to a (possibly target-specific) list of external symbols/static > >> objects. But I don't know whether this would be sufficient to meet the > >> needs of existing IFUNC users. > >> > >> This is really a huge mess -- the fact that a feature like this was > >> allowed to get "out in the wild" with no clear contract for how it > >> works, leaving us stuck with a mess of difficult-to-meet > >> implementation constraints that aren't even clear. > > > > In complete agreement. > > > > Unfortunately no core developer wants to step up and fix this, and > > thus it remains a poorly understood and difficult to use feature. > > > > IFUNC is processor specific. I have some old docs for x86 IFUNC: the relocs might be, but that doesn't mean the overall process/ABI contract is. it's crazy that the same C code might work/not work depending on the CPU/C lib. -mike
On Fri, Mar 20, 2015 at 06:02:56PM -0400, Mike Frysinger wrote: > On 20 Mar 2015 08:58, H.J. Lu wrote: > > On Fri, Mar 20, 2015 at 8:51 AM, Carlos O'Donell wrote: > > > Unfortunately no core developer wants to step up and fix this, and > > > thus it remains a poorly understood and difficult to use feature. The solutions I see are 1) Run two passes over dynamic relocations. Who wants to be responsible for slowing down app startup? 2) Like (1) but on the first pass put ifunc relocations on a worklist. 3) Recursive dynamic relocation. ie. before running an ifunc resolver, check that it has been relocated, and if not, resolve its shared library. Probably not feasible due to loops in reloc dependencies. 4) Accept that you may need to write ifunc resolvers in assembly. They cannot use the GOT unless there is an ironclad guarantee that they won't be called before they themselves are relocated. > > IFUNC is processor specific. I have some old docs for x86 IFUNC: > > the relocs might be, but that doesn't mean the overall process/ABI contract is. > it's crazy that the same C code might work/not work depending on the CPU/C lib. The really bad news is that it is trivial to construct a testcase that shows current C ifunc resolvers, even in libc.so, won't work on targets that must use the GOT to return the address of a function. cat > hello.c <<EOF #include <stdio.h> extern int world (int) __attribute__ ((weak)); void print (const char *str) { puts (str); } int main () { print ("Hello"); if (world) return world (1); return 0; } EOF cat > world.c <<EOF extern void print (const char *); int world (int x) { print ("world"); return x; } EOF gcc -o world.so -fPIC -shared -nostdlib world.c gcc -o hello -Wl,-no-as-needed,-rpath,. hello.c -lc world.so LD_DEBUG=reloc ./hello 2>&1 | grep relocation The debug output will show world.so being relocated first, before libc.so. So if world.so happened to take the address of a libc.so ifunc, or call a libc.so ifunc with LD_BIND_NOW=1, then the libc.so ifunc resolver would be called *before libc.so has been relocated*. Now the only thing I did a little unusual was to link "world.so" with -nostdlib. I also specified library order when linking "hello" in an evil way that I happen to know will cause a problem. Users can easily do that by accident.. When you start looking at a current libm.so and libpthread.so, you'll see ifunc again, and the chance of users getting the wrong dynamic relocation order increases.
On Fri, Mar 20, 2015 at 11:22:45AM -0400, Rich Felker wrote: > On Fri, Mar 20, 2015 at 10:22:08AM +1030, Alan Modra wrote: > > IFUNC is difficult to correctly implement on any target needing a GOT > > to support position independent code, due to the dependency on order > > of dynamic relocations. ld.so should be changed to apply IFUNC > > relocations last, globally, because without that it is actually > > impossible to write an IFUNC resolver in C that works in all > > situations. > > There are still all sorts of issues with IFUNC resolvers in terms of > what they can safely do. I appreciate efforts to improve the current > practical situation, but I don't think we can lay the issue to rest > until there's a proper specification for what IFUNC resolvers can do. > For example even with your ordering requirement, things would still > break if one IFUNC resolver ends up referencing another function that > happens to be resolved via IFUNC. Yes, that would be unfortunate. "Don't call non-local functions in an ifunc resolver" sounds like a good rule. Certainly for user code, where you don't know whether an external function may use ifunc (or might in some future version of a library). > My preference would be to specify that the IFUNC resolver cannot > access any external-linkage symbols (except possibly with hidden > visibility) or objects with static storage duration. Right, that helps, and covers calling a non-local function too. However, an ifunc resolver must return an address. Targets that don't have pc-relative addressing, and don't implement some form of gp-relative addressing, calculate even local PIC addresses by loading a GOT entry. That's the fundamental problem of C ifunc resolvers, and why I say it is impossible to write ifunc resolvers in C that work in all situations.
On 03/23/2015 07:55 PM, Alan Modra wrote: > However, an ifunc resolver must return an address. Targets that don't > have pc-relative addressing, and don't implement some form of > gp-relative addressing, calculate even local PIC addresses by loading > a GOT entry. True. But easily fixable by mandating the implementation of gp-relative addressing when implementing ifunc. Do we have any left that miss this criteria? Most of the ones that don't have gp-rel also don't have ifunc. r~
On Tue, Mar 24, 2015 at 07:47:47AM -0700, Richard Henderson wrote: > On 03/23/2015 07:55 PM, Alan Modra wrote: > > However, an ifunc resolver must return an address. Targets that don't > > have pc-relative addressing, and don't implement some form of > > gp-relative addressing, calculate even local PIC addresses by loading > > a GOT entry. > > True. But easily fixable by mandating the implementation of gp-relative > addressing when implementing ifunc. Do we have any left that miss this > criteria? powerpc is in this category. powerpc64 -mcmodel=small or -mcmodel=large also. hppa also, I think.
On Fri, 2015-03-20 at 10:22 +1030, Alan Modra wrote: > IFUNC is difficult to correctly implement on any target needing a GOT > to support position independent code, due to the dependency on order > of dynamic relocations. ld.so should be changed to apply IFUNC > relocations last, globally, because without that it is actually > impossible to write an IFUNC resolver in C that works in all > situations. Case in point, vfork in libpthread.so is an IFUNC with > the resolver returning &__libc_vfork. (system and fork are similar.) > If another shared library, libA say, uses vfork then it is quite > possible that libpthread.so hasn't been dynamically relocated before > the unfortunate libA is dynamically relocated. In that case the GOT > entry for &__libc_vfork is still zero, so the IFUNC resolver returns > NULL. LD_BIND_NOW=1 results in libA PLT dynamic relocations being > applied using this NULL value and ld.so segfaults. > > This patch hardens ld.so to not segfault on a NULL from an IFUNC > resolver. It also fixes a problem with undefined weak. If you leave > the plt entry as-is for undefined weak then if the entry is ever > called it will loop in ld.so rather than segfaulting. > > Regression tested powerpc64-linux. > > * sysdeps/powerpc/powerpc64/dl-machine.h (elf_machine_fixup_plt): > Don't segfault if ifunc resolver returns a NULL. Do set plt to > zero for undefined weak. > (elf_machine_plt_conflict): Similarly. > I have reviewed this patch and agree it should committed.
diff --git a/sysdeps/powerpc/powerpc64/dl-machine.h b/sysdeps/powerpc/powerpc64/dl-machine.h index 55ac736..0576781 100644 --- a/sysdeps/powerpc/powerpc64/dl-machine.h +++ b/sysdeps/powerpc/powerpc64/dl-machine.h @@ -472,19 +472,32 @@ elf_machine_fixup_plt (struct link_map *map, lookup_t sym_map, Elf64_FuncDesc *plt = (Elf64_FuncDesc *) reloc_addr; Elf64_FuncDesc *rel = (Elf64_FuncDesc *) finaladdr; Elf64_Addr offset = 0; + Elf64_FuncDesc zero_fd = {0, 0, 0}; PPC_DCBT (&plt->fd_aux); PPC_DCBT (&plt->fd_func); - PPC_DCBT (&rel->fd_aux); - PPC_DCBT (&rel->fd_func); - /* If sym_map is NULL, it's a weak undefined sym; Leave the plt zero. */ + /* If sym_map is NULL, it's a weak undefined sym; Set the plt to + zero. finaladdr should be zero already in this case, but guard + against invalid plt relocations with non-zero addends. */ if (sym_map == NULL) - return 0; + finaladdr = 0; + + /* Don't die here if finaladdr is zero, die if this plt entry is + actually called. Makes a difference when LD_BIND_NOW=1. + finaladdr may be zero for a weak undefined symbol, or when an + ifunc resolver returns zero. */ + if (finaladdr == 0) + rel = &zero_fd; + else + { + PPC_DCBT (&rel->fd_aux); + PPC_DCBT (&rel->fd_func); + } /* If the opd entry is not yet relocated (because it's from a shared object that hasn't been processed yet), then manually reloc it. */ - if (map != sym_map && !sym_map->l_relocated + if (finaladdr != 0 && map != sym_map && !sym_map->l_relocated #if !defined RTLD_BOOTSTRAP && defined SHARED /* Bootstrap map doesn't have l_relocated set for it. */ && sym_map != &GL(dl_rtld_map) @@ -522,6 +535,13 @@ elf_machine_plt_conflict (struct link_map *map, lookup_t sym_map, #if _CALL_ELF != 2 Elf64_FuncDesc *plt = (Elf64_FuncDesc *) reloc_addr; Elf64_FuncDesc *rel = (Elf64_FuncDesc *) finaladdr; + Elf64_FuncDesc zero_fd = {0, 0, 0}; + + if (sym_map == NULL) + finaladdr = 0; + + if (finaladdr == 0) + rel = &zero_fd; plt->fd_func = rel->fd_func; plt->fd_aux = rel->fd_aux;