diff mbox

Harden powerpc64 elf_machine_fixup_plt

Message ID 20150319235208.GB26234@bubble.grove.modra.org
State New
Headers show

Commit Message

Alan Modra March 19, 2015, 11:52 p.m. UTC
IFUNC is difficult to correctly implement on any target needing a GOT
to support position independent code, due to the dependency on order
of dynamic relocations.  ld.so should be changed to apply IFUNC
relocations last, globally, because without that it is actually
impossible to write an IFUNC resolver in C that works in all
situations.  Case in point, vfork in libpthread.so is an IFUNC with
the resolver returning &__libc_vfork.  (system and fork are similar.)
If another shared library, libA say, uses vfork then it is quite
possible that libpthread.so hasn't been dynamically relocated before
the unfortunate libA is dynamically relocated.  In that case the GOT
entry for &__libc_vfork is still zero, so the IFUNC resolver returns
NULL.  LD_BIND_NOW=1 results in libA PLT dynamic relocations being
applied using this NULL value and ld.so segfaults.

This patch hardens ld.so to not segfault on a NULL from an IFUNC
resolver.  It also fixes a problem with undefined weak.  If you leave
the plt entry as-is for undefined weak then if the entry is ever
called it will loop in ld.so rather than segfaulting.

Regression tested powerpc64-linux.

	* sysdeps/powerpc/powerpc64/dl-machine.h (elf_machine_fixup_plt):
	Don't segfault if ifunc resolver returns a NULL.  Do set plt to
	zero for undefined weak.
	(elf_machine_plt_conflict): Similarly.

Comments

Rich Felker March 20, 2015, 3:22 p.m. UTC | #1
On Fri, Mar 20, 2015 at 10:22:08AM +1030, Alan Modra wrote:
> IFUNC is difficult to correctly implement on any target needing a GOT
> to support position independent code, due to the dependency on order
> of dynamic relocations.  ld.so should be changed to apply IFUNC
> relocations last, globally, because without that it is actually
> impossible to write an IFUNC resolver in C that works in all
> situations.

There are still all sorts of issues with IFUNC resolvers in terms of
what they can safely do. I appreciate efforts to improve the current
practical situation, but I don't think we can lay the issue to rest
until there's a proper specification for what IFUNC resolvers can do.
For example even with your ordering requirement, things would still
break if one IFUNC resolver ends up referencing another function that
happens to be resolved via IFUNC.

My preference would be to specify that the IFUNC resolver cannot
access any external-linkage symbols (except possibly with hidden
visibility) or objects with static storage duration. This would be
suitable for resolvers which can obtain a result purely based on cpu
probing (in inline asm), but would not work if AT_HWCAP or similar is
needed. A slightly less restrictive option would be to permit access
to a (possibly target-specific) list of external symbols/static
objects. But I don't know whether this would be sufficient to meet the
needs of existing IFUNC users.

This is really a huge mess -- the fact that a feature like this was
allowed to get "out in the wild" with no clear contract for how it
works, leaving us stuck with a mess of difficult-to-meet
implementation constraints that aren't even clear.

Rich
Carlos O'Donell March 20, 2015, 3:51 p.m. UTC | #2
On 03/20/2015 11:22 AM, Rich Felker wrote:
> On Fri, Mar 20, 2015 at 10:22:08AM +1030, Alan Modra wrote:
>> IFUNC is difficult to correctly implement on any target needing a GOT
>> to support position independent code, due to the dependency on order
>> of dynamic relocations.  ld.so should be changed to apply IFUNC
>> relocations last, globally, because without that it is actually
>> impossible to write an IFUNC resolver in C that works in all
>> situations.
> 
> There are still all sorts of issues with IFUNC resolvers in terms of
> what they can safely do. I appreciate efforts to improve the current
> practical situation, but I don't think we can lay the issue to rest
> until there's a proper specification for what IFUNC resolvers can do.
> For example even with your ordering requirement, things would still
> break if one IFUNC resolver ends up referencing another function that
> happens to be resolved via IFUNC.
> 
> My preference would be to specify that the IFUNC resolver cannot
> access any external-linkage symbols (except possibly with hidden
> visibility) or objects with static storage duration. This would be
> suitable for resolvers which can obtain a result purely based on cpu
> probing (in inline asm), but would not work if AT_HWCAP or similar is
> needed. A slightly less restrictive option would be to permit access
> to a (possibly target-specific) list of external symbols/static
> objects. But I don't know whether this would be sufficient to meet the
> needs of existing IFUNC users.
> 
> This is really a huge mess -- the fact that a feature like this was
> allowed to get "out in the wild" with no clear contract for how it
> works, leaving us stuck with a mess of difficult-to-meet
> implementation constraints that aren't even clear.

In complete agreement.

Unfortunately no core developer wants to step up and fix this, and
thus it remains a poorly understood and difficult to use feature.

Cheers,
Carlos.
H.J. Lu March 20, 2015, 3:58 p.m. UTC | #3
On Fri, Mar 20, 2015 at 8:51 AM, Carlos O'Donell <carlos@redhat.com> wrote:
> On 03/20/2015 11:22 AM, Rich Felker wrote:
>> On Fri, Mar 20, 2015 at 10:22:08AM +1030, Alan Modra wrote:
>>> IFUNC is difficult to correctly implement on any target needing a GOT
>>> to support position independent code, due to the dependency on order
>>> of dynamic relocations.  ld.so should be changed to apply IFUNC
>>> relocations last, globally, because without that it is actually
>>> impossible to write an IFUNC resolver in C that works in all
>>> situations.
>>
>> There are still all sorts of issues with IFUNC resolvers in terms of
>> what they can safely do. I appreciate efforts to improve the current
>> practical situation, but I don't think we can lay the issue to rest
>> until there's a proper specification for what IFUNC resolvers can do.
>> For example even with your ordering requirement, things would still
>> break if one IFUNC resolver ends up referencing another function that
>> happens to be resolved via IFUNC.
>>
>> My preference would be to specify that the IFUNC resolver cannot
>> access any external-linkage symbols (except possibly with hidden
>> visibility) or objects with static storage duration. This would be
>> suitable for resolvers which can obtain a result purely based on cpu
>> probing (in inline asm), but would not work if AT_HWCAP or similar is
>> needed. A slightly less restrictive option would be to permit access
>> to a (possibly target-specific) list of external symbols/static
>> objects. But I don't know whether this would be sufficient to meet the
>> needs of existing IFUNC users.
>>
>> This is really a huge mess -- the fact that a feature like this was
>> allowed to get "out in the wild" with no clear contract for how it
>> works, leaving us stuck with a mess of difficult-to-meet
>> implementation constraints that aren't even clear.
>
> In complete agreement.
>
> Unfortunately no core developer wants to step up and fix this, and
> thus it remains a poorly understood and difficult to use feature.
>

IFUNC is processor specific. I have some old docs for x86 IFUNC:

https://sites.google.com/site/x32abi/documents/ifunc.txt?attredirects=0&d=1

I can update IFUNC spec in x86 and x86-64 psABIs.  Any feedbacks
are more than welcome.  You can send your comments to

https://groups.google.com/forum/#!forum/x86-64-abi

Thanks.
Mike Frysinger March 20, 2015, 10:02 p.m. UTC | #4
On 20 Mar 2015 08:58, H.J. Lu wrote:
> On Fri, Mar 20, 2015 at 8:51 AM, Carlos O'Donell wrote:
> > On 03/20/2015 11:22 AM, Rich Felker wrote:
> >> On Fri, Mar 20, 2015 at 10:22:08AM +1030, Alan Modra wrote:
> >>> IFUNC is difficult to correctly implement on any target needing a GOT
> >>> to support position independent code, due to the dependency on order
> >>> of dynamic relocations.  ld.so should be changed to apply IFUNC
> >>> relocations last, globally, because without that it is actually
> >>> impossible to write an IFUNC resolver in C that works in all
> >>> situations.
> >>
> >> There are still all sorts of issues with IFUNC resolvers in terms of
> >> what they can safely do. I appreciate efforts to improve the current
> >> practical situation, but I don't think we can lay the issue to rest
> >> until there's a proper specification for what IFUNC resolvers can do.
> >> For example even with your ordering requirement, things would still
> >> break if one IFUNC resolver ends up referencing another function that
> >> happens to be resolved via IFUNC.
> >>
> >> My preference would be to specify that the IFUNC resolver cannot
> >> access any external-linkage symbols (except possibly with hidden
> >> visibility) or objects with static storage duration. This would be
> >> suitable for resolvers which can obtain a result purely based on cpu
> >> probing (in inline asm), but would not work if AT_HWCAP or similar is
> >> needed. A slightly less restrictive option would be to permit access
> >> to a (possibly target-specific) list of external symbols/static
> >> objects. But I don't know whether this would be sufficient to meet the
> >> needs of existing IFUNC users.
> >>
> >> This is really a huge mess -- the fact that a feature like this was
> >> allowed to get "out in the wild" with no clear contract for how it
> >> works, leaving us stuck with a mess of difficult-to-meet
> >> implementation constraints that aren't even clear.
> >
> > In complete agreement.
> >
> > Unfortunately no core developer wants to step up and fix this, and
> > thus it remains a poorly understood and difficult to use feature.
> >
> 
> IFUNC is processor specific. I have some old docs for x86 IFUNC:

the relocs might be, but that doesn't mean the overall process/ABI contract is.  
it's crazy that the same C code might work/not work depending on the CPU/C lib.
-mike
Alan Modra March 23, 2015, 3:35 a.m. UTC | #5
On Fri, Mar 20, 2015 at 06:02:56PM -0400, Mike Frysinger wrote:
> On 20 Mar 2015 08:58, H.J. Lu wrote:
> > On Fri, Mar 20, 2015 at 8:51 AM, Carlos O'Donell wrote:
> > > Unfortunately no core developer wants to step up and fix this, and
> > > thus it remains a poorly understood and difficult to use feature.

The solutions I see are
1) Run two passes over dynamic relocations.  Who wants to be
   responsible for slowing down app startup?
2) Like (1) but on the first pass put ifunc relocations on a worklist.
3) Recursive dynamic relocation.  ie. before running an ifunc
   resolver, check that it has been relocated, and if not, resolve its
   shared library.  Probably not feasible due to loops in reloc
   dependencies.
4) Accept that you may need to write ifunc resolvers in assembly.
   They cannot use the GOT unless there is an ironclad guarantee that
   they won't be called before they themselves are relocated.

> > IFUNC is processor specific. I have some old docs for x86 IFUNC:
> 
> the relocs might be, but that doesn't mean the overall process/ABI contract is.  
> it's crazy that the same C code might work/not work depending on the CPU/C lib.

The really bad news is that it is trivial to construct a testcase that
shows current C ifunc resolvers, even in libc.so, won't work on
targets that must use the GOT to return the address of a function.

cat > hello.c <<EOF
#include <stdio.h>

extern int world (int) __attribute__ ((weak));

void
print (const char *str)
{
  puts (str);
}

int
main ()
{
  print ("Hello");
  if (world)
    return world (1);
  return 0;
}
EOF
cat > world.c <<EOF
extern void print (const char *);

int
world (int x)
{
  print ("world");
  return x;
}
EOF
gcc -o world.so -fPIC -shared -nostdlib world.c
gcc -o hello -Wl,-no-as-needed,-rpath,. hello.c -lc world.so
LD_DEBUG=reloc ./hello 2>&1 | grep relocation

The debug output will show world.so being relocated first, before
libc.so.  So if world.so happened to take the address of a libc.so
ifunc, or call a libc.so ifunc with LD_BIND_NOW=1, then the libc.so
ifunc resolver would be called *before libc.so has been relocated*.

Now the only thing I did a little unusual was to link "world.so" with
-nostdlib.  I also specified library order when linking "hello" in an
evil way that I happen to know will cause a problem.  Users can easily
do that by accident..

When you start looking at a current libm.so and libpthread.so, you'll
see ifunc again, and the chance of users getting the wrong dynamic
relocation order increases.
Alan Modra March 24, 2015, 2:55 a.m. UTC | #6
On Fri, Mar 20, 2015 at 11:22:45AM -0400, Rich Felker wrote:
> On Fri, Mar 20, 2015 at 10:22:08AM +1030, Alan Modra wrote:
> > IFUNC is difficult to correctly implement on any target needing a GOT
> > to support position independent code, due to the dependency on order
> > of dynamic relocations.  ld.so should be changed to apply IFUNC
> > relocations last, globally, because without that it is actually
> > impossible to write an IFUNC resolver in C that works in all
> > situations.
> 
> There are still all sorts of issues with IFUNC resolvers in terms of
> what they can safely do. I appreciate efforts to improve the current
> practical situation, but I don't think we can lay the issue to rest
> until there's a proper specification for what IFUNC resolvers can do.
> For example even with your ordering requirement, things would still
> break if one IFUNC resolver ends up referencing another function that
> happens to be resolved via IFUNC.

Yes, that would be unfortunate.  "Don't call non-local functions in an
ifunc resolver" sounds like a good rule.  Certainly for user code,
where you don't know whether an external function may use ifunc (or
might in some future version of a library).

> My preference would be to specify that the IFUNC resolver cannot
> access any external-linkage symbols (except possibly with hidden
> visibility) or objects with static storage duration.

Right, that helps, and covers calling a non-local function too.
However, an ifunc resolver must return an address.  Targets that don't
have pc-relative addressing, and don't implement some form of
gp-relative addressing, calculate even local PIC addresses by loading
a GOT entry.  That's the fundamental problem of C ifunc resolvers, and
why I say it is impossible to write ifunc resolvers in C that work in
all situations.
Richard Henderson March 24, 2015, 2:47 p.m. UTC | #7
On 03/23/2015 07:55 PM, Alan Modra wrote:
> However, an ifunc resolver must return an address.  Targets that don't
> have pc-relative addressing, and don't implement some form of
> gp-relative addressing, calculate even local PIC addresses by loading
> a GOT entry.

True.  But easily fixable by mandating the implementation of gp-relative
addressing when implementing ifunc.  Do we have any left that miss this
criteria?  Most of the ones that don't have gp-rel also don't have ifunc.


r~
Alan Modra March 25, 2015, 12:05 a.m. UTC | #8
On Tue, Mar 24, 2015 at 07:47:47AM -0700, Richard Henderson wrote:
> On 03/23/2015 07:55 PM, Alan Modra wrote:
> > However, an ifunc resolver must return an address.  Targets that don't
> > have pc-relative addressing, and don't implement some form of
> > gp-relative addressing, calculate even local PIC addresses by loading
> > a GOT entry.
> 
> True.  But easily fixable by mandating the implementation of gp-relative
> addressing when implementing ifunc.  Do we have any left that miss this
> criteria?

powerpc is in this category.  powerpc64 -mcmodel=small or
-mcmodel=large also.  hppa also, I think.
Steven Munroe March 26, 2015, 12:29 a.m. UTC | #9
On Fri, 2015-03-20 at 10:22 +1030, Alan Modra wrote:
> IFUNC is difficult to correctly implement on any target needing a GOT
> to support position independent code, due to the dependency on order
> of dynamic relocations.  ld.so should be changed to apply IFUNC
> relocations last, globally, because without that it is actually
> impossible to write an IFUNC resolver in C that works in all
> situations.  Case in point, vfork in libpthread.so is an IFUNC with
> the resolver returning &__libc_vfork.  (system and fork are similar.)
> If another shared library, libA say, uses vfork then it is quite
> possible that libpthread.so hasn't been dynamically relocated before
> the unfortunate libA is dynamically relocated.  In that case the GOT
> entry for &__libc_vfork is still zero, so the IFUNC resolver returns
> NULL.  LD_BIND_NOW=1 results in libA PLT dynamic relocations being
> applied using this NULL value and ld.so segfaults.
> 
> This patch hardens ld.so to not segfault on a NULL from an IFUNC
> resolver.  It also fixes a problem with undefined weak.  If you leave
> the plt entry as-is for undefined weak then if the entry is ever
> called it will loop in ld.so rather than segfaulting.
> 
> Regression tested powerpc64-linux.
> 
> 	* sysdeps/powerpc/powerpc64/dl-machine.h (elf_machine_fixup_plt):
> 	Don't segfault if ifunc resolver returns a NULL.  Do set plt to
> 	zero for undefined weak.
> 	(elf_machine_plt_conflict): Similarly.
> 

I have reviewed this patch and agree it should committed.
diff mbox

Patch

diff --git a/sysdeps/powerpc/powerpc64/dl-machine.h b/sysdeps/powerpc/powerpc64/dl-machine.h
index 55ac736..0576781 100644
--- a/sysdeps/powerpc/powerpc64/dl-machine.h
+++ b/sysdeps/powerpc/powerpc64/dl-machine.h
@@ -472,19 +472,32 @@  elf_machine_fixup_plt (struct link_map *map, lookup_t sym_map,
   Elf64_FuncDesc *plt = (Elf64_FuncDesc *) reloc_addr;
   Elf64_FuncDesc *rel = (Elf64_FuncDesc *) finaladdr;
   Elf64_Addr offset = 0;
+  Elf64_FuncDesc zero_fd = {0, 0, 0};
 
   PPC_DCBT (&plt->fd_aux);
   PPC_DCBT (&plt->fd_func);
-  PPC_DCBT (&rel->fd_aux);
-  PPC_DCBT (&rel->fd_func);
 
-  /* If sym_map is NULL, it's a weak undefined sym;  Leave the plt zero.  */
+  /* If sym_map is NULL, it's a weak undefined sym;  Set the plt to
+     zero.  finaladdr should be zero already in this case, but guard
+     against invalid plt relocations with non-zero addends.  */
   if (sym_map == NULL)
-    return 0;
+    finaladdr = 0;
+
+  /* Don't die here if finaladdr is zero, die if this plt entry is
+     actually called.  Makes a difference when LD_BIND_NOW=1.
+     finaladdr may be zero for a weak undefined symbol, or when an
+     ifunc resolver returns zero.  */
+  if (finaladdr == 0)
+    rel = &zero_fd;
+  else
+    {
+      PPC_DCBT (&rel->fd_aux);
+      PPC_DCBT (&rel->fd_func);
+    }
 
   /* If the opd entry is not yet relocated (because it's from a shared
      object that hasn't been processed yet), then manually reloc it.  */
-  if (map != sym_map && !sym_map->l_relocated
+  if (finaladdr != 0 && map != sym_map && !sym_map->l_relocated
 #if !defined RTLD_BOOTSTRAP && defined SHARED
       /* Bootstrap map doesn't have l_relocated set for it.  */
       && sym_map != &GL(dl_rtld_map)
@@ -522,6 +535,13 @@  elf_machine_plt_conflict (struct link_map *map, lookup_t sym_map,
 #if _CALL_ELF != 2
   Elf64_FuncDesc *plt = (Elf64_FuncDesc *) reloc_addr;
   Elf64_FuncDesc *rel = (Elf64_FuncDesc *) finaladdr;
+  Elf64_FuncDesc zero_fd = {0, 0, 0};
+
+  if (sym_map == NULL)
+    finaladdr = 0;
+
+  if (finaladdr == 0)
+    rel = &zero_fd;
 
   plt->fd_func = rel->fd_func;
   plt->fd_aux = rel->fd_aux;