mbox series

[RFC,0/3] implement dlmopen hooks for gdb

Message ID 20200626193228.1953-1-danielwa@cisco.com
Headers show
Series implement dlmopen hooks for gdb | expand

Message

Daniel Walker \(danielwa\) June 26, 2020, 7:32 p.m. UTC
Cisco System, Inc. has a need to have dlmopen support in gdb, which
required glibc changes. I think it was known when glibc implemented
dlmopen that gdb would not work with it.

Since 2015 Cisco has had these patches in our inventor to fix issues in
glibc which prevented this type of gdb usage.

This RFC is mainly to get guidance on this implementation. We have some
individuals who have signed the copyright assignment for glibc, and we
will submit these (or different patches) formally thru those channels if
no one has issues with the implementation.

Also included in this are a couple of fixes which went along with the
original implementation.

Please provide any comments you might have.

Conan C Huang (3):
  Segfault when dlopen with RTLD_GLOBAL in dlmopened library
  glibc: dlopen RTLD_NOLOAD optimization
  add r_debug multiple namespaces support

 elf/dl-close.c |  7 ++++++-
 elf/dl-debug.c | 13 ++++++++++---
 elf/dl-open.c  |  8 +++++++-
 elf/link.h     |  4 ++++
 4 files changed, 27 insertions(+), 5 deletions(-)

Comments

Carlos O'Donell June 26, 2020, 9:17 p.m. UTC | #1
On 6/26/20 3:32 PM, Daniel Walker via Libc-alpha wrote:
> Cisco System, Inc. has a need to have dlmopen support in gdb, which
> required glibc changes. I think it was known when glibc implemented
> dlmopen that gdb would not work with it.
> 
> Since 2015 Cisco has had these patches in our inventor to fix issues in
> glibc which prevented this type of gdb usage.
> 
> This RFC is mainly to get guidance on this implementation. We have some
> individuals who have signed the copyright assignment for glibc, and we
> will submit these (or different patches) formally thru those channels if
> no one has issues with the implementation.
> 
> Also included in this are a couple of fixes which went along with the
> original implementation.
> 
> Please provide any comments you might have.
> 
> Conan C Huang (3):
>   Segfault when dlopen with RTLD_GLOBAL in dlmopened library
>   glibc: dlopen RTLD_NOLOAD optimization
>   add r_debug multiple namespaces support
> 
>  elf/dl-close.c |  7 ++++++-
>  elf/dl-debug.c | 13 ++++++++++---
>  elf/dl-open.c  |  8 +++++++-
>  elf/link.h     |  4 ++++
>  4 files changed, 27 insertions(+), 5 deletions(-)
> 

Thanks for looking at this. It is something the community would
absolutely like to see. I'll comment quickly to provide direction.

Florian Weimer, Pedro Alves, and I were talking about this as
recently as April where we tried to agree to just adding a
_r_debug_dlmopen with a new ABI for the debugger to use.

Your proposed solution of bumping the version is unacceptable,
and was last rejected by Roland McGrath. The problem is that
when you bump the version the current 

It is easier from
a backwards compatibility perspective to add a new _r_debug_dlmopen
and use that instead.

gdb checks for r_version != 1 and issues a warning, but keeps going:

6952           if (linux_read_memory (priv->r_debug + lmo->r_version_offset,
6953                                  (unsigned char *) &r_version,
6954                                  sizeof (r_version)) != 0
6955               || r_version != 1)
                      ^^^^^^^^^^^^^^
6956             {
6957               warning ("unexpected r_debug version %d", r_version);
6958             }

This is bad precedent that other software might have hard checks
for r_version != 1 stop operating correclty.

I suggest reviewing these threads:
https://sourceware.org/legacy-ml/libc-alpha/2012-11/msg00182.html
https://sourceware.org/legacy-ml/libc-alpha/2012-12/msg00278.html
https://sourceware.org/legacy-ml/libc-alpha/2013-01/msg00045.html

An alternative suggested in 2012 was to add a new DT_* entry to point
to the extended debug information e.g. DT_DEBUG_EXTENDED, and so avoid
needing ld.so for lookup of _r_debug_dlmopen.  Gary Benson also suggests
versioning the new structure, but being very clear what a "version bump"
means, in that we compatible add elements to the end after each version
change. So all consumers would check _r_debug_dlmopen.r_version > 1 to
know they had at least v1 elements.

And for reference from Solaris:
https://docs.oracle.com/cd/E23824_01/html/819-0690/chapter6-1247.html#chapter6-15
I'd want to avoid having to run code to get at these objects, since
experience has shown this is always going to cause problems. Having
an entirely data-driven approach would be preferable, but locks us
into an ABI that we have to be able to bump.
Florian Weimer June 26, 2020, 9:30 p.m. UTC | #2
* Daniel Walker via Libc-alpha:

> Also included in this are a couple of fixes which went along with the
> original implementation.

Have you seen crashes as the result of dlopen or dlsym failures in
secondary namespaces?

Thanks,
Florian
Daniel Walker \(danielwa\) June 27, 2020, 1:10 a.m. UTC | #3
On Fri, Jun 26, 2020 at 11:30:12PM +0200, Florian Weimer wrote:
> * Daniel Walker via Libc-alpha:
> 
> > Also included in this are a couple of fixes which went along with the
> > original implementation.
> 
> Have you seen crashes as the result of dlopen or dlsym failures in
> secondary namespaces?

++ Conan

Conan has done the most work on this, and I think he's working on it in terms of
the product usage. I neglected to include him on this cover email. I've added
him to this email hopefully he can respond.

Daniel
Conan Huang \(conhuang\) July 2, 2020, 1:54 p.m. UTC | #4
The only crash we saw was the promotion of RTLD_GLOBAL flag in secondary 
namespace.  Apart from that we didn't notice any other crashes or dlsym failures.

However, we did noticed a design limitation with static TLS. Where shared objects
with static TLS can quickly use up static TLS block reserved by the loader. This
usually isn't a problem since only a few core libraries have static TLS and they are
not dlopened. However, during each dlmopen these core libraries like libc are
loaded and its static TLS uses up valuable space in static TLS block.  Resulting to:

	libc.so.6: cannot allocate memory in static TLS block

We are currently looking at how this can be enhanced. Maybe you guys already 
have discussions around this issue.


´╗┐On 2020-06-26, 9:11 PM, "Daniel Walker (danielwa)" <danielwa@cisco.com> wrote:

    On Fri, Jun 26, 2020 at 11:30:12PM +0200, Florian Weimer wrote:
    > * Daniel Walker via Libc-alpha:
    > 
    > > Also included in this are a couple of fixes which went along with the
    > > original implementation.
    > 
    > Have you seen crashes as the result of dlopen or dlsym failures in
    > secondary namespaces?

    ++ Conan

    Conan has done the most work on this, and I think he's working on it in terms of
    the product usage. I neglected to include him on this cover email. I've added
    him to this email hopefully he can respond.

    Daniel
Daniel Walker \(danielwa\) July 23, 2020, 6:40 p.m. UTC | #5
On Fri, Jun 26, 2020 at 05:17:17PM -0400, Carlos O'Donell wrote:
> On 6/26/20 3:32 PM, Daniel Walker via Libc-alpha wrote:
> > Cisco System, Inc. has a need to have dlmopen support in gdb, which
> > required glibc changes. I think it was known when glibc implemented
> > dlmopen that gdb would not work with it.
> > 
> > Since 2015 Cisco has had these patches in our inventor to fix issues in
> > glibc which prevented this type of gdb usage.
> > 
> > This RFC is mainly to get guidance on this implementation. We have some
> > individuals who have signed the copyright assignment for glibc, and we
> > will submit these (or different patches) formally thru those channels if
> > no one has issues with the implementation.
> > 
> > Also included in this are a couple of fixes which went along with the
> > original implementation.
> > 
> > Please provide any comments you might have.
> > 
> > Conan C Huang (3):
> >   Segfault when dlopen with RTLD_GLOBAL in dlmopened library
> >   glibc: dlopen RTLD_NOLOAD optimization
> >   add r_debug multiple namespaces support
> > 
> >  elf/dl-close.c |  7 ++++++-
> >  elf/dl-debug.c | 13 ++++++++++---
> >  elf/dl-open.c  |  8 +++++++-
> >  elf/link.h     |  4 ++++
> >  4 files changed, 27 insertions(+), 5 deletions(-)
> > 
> 
> Thanks for looking at this. It is something the community would
> absolutely like to see. I'll comment quickly to provide direction.
> 
> Florian Weimer, Pedro Alves, and I were talking about this as
> recently as April where we tried to agree to just adding a
> _r_debug_dlmopen with a new ABI for the debugger to use.
> 


Here's another RFC I suppose. It's basic code I've only compile tested. It's
based on the comments, and the threads you provided. It just abstracts out the
next link into another structure. Let me know if this is in the ballpark of the
discussions.


---
 elf/dl-debug.c             | 19 +++++++++++++++++--
 elf/link.h                 |  6 ++++++
 sysdeps/generic/ldsodefs.h |  1 +
 3 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/elf/dl-debug.c b/elf/dl-debug.c
index 4b3d3ad6ba..d0009744f8 100644
--- a/elf/dl-debug.c
+++ b/elf/dl-debug.c
@@ -35,6 +35,7 @@ extern const int verify_link_map_members[(VERIFY_MEMBER (l_addr)
    a statically-linked program there is no dynamic section for the debugger
    to examine and it looks for this particular symbol name.  */
 struct r_debug _r_debug;
+struct r_debug_dlmopen _r_debug_dlmopen;
 
 
 /* Initialize _r_debug if it has not already been done.  The argument is
@@ -45,11 +46,22 @@ struct r_debug *
 _dl_debug_initialize (ElfW(Addr) ldbase, Lmid_t ns)
 {
   struct r_debug *r;
+  struct r_debug_dlmopen *r_ns, *rp_ns;
 
   if (ns == LM_ID_BASE)
-    r = &_r_debug;
+    {
+      r = &_r_debug;
+      r_ns = &_r_debug_dlmopen;
+    }
   else
-    r = &GL(dl_ns)[ns]._ns_debug;
+    {
+      r = &GL(dl_ns)[ns]._ns_debug;
+      r_ns = &GL(dl_ns)[ns]._ns_debug_dlmopen;
+      rp_ns = &GL(dl_ns)[ns - 1]._ns_debug_dlmopen;
+      rp_ns->next = r_ns;
+      if (ns - 1 == LM_ID_BASE)
+        _r_debug_dlmopen.next = r_ns;
+    }
 
   if (r->r_map == NULL || ldbase != 0)
     {
@@ -58,6 +70,9 @@ _dl_debug_initialize (ElfW(Addr) ldbase, Lmid_t ns)
       r->r_ldbase = ldbase ?: _r_debug.r_ldbase;
       r->r_map = (void *) GL(dl_ns)[ns]._ns_loaded;
       r->r_brk = (ElfW(Addr)) &_dl_debug_state;
+      r_ns->r_debug = r;
+      r_ns->next = NULL;
+
     }
 
   return r;
diff --git a/elf/link.h b/elf/link.h
index 0048ad5d4d..c81945b671 100644
--- a/elf/link.h
+++ b/elf/link.h
@@ -63,8 +63,14 @@ struct r_debug
     ElfW(Addr) r_ldbase;	/* Base address the linker is loaded at.  */
   };
 
+struct r_debug_dlmopen
+  {
+    struct r_debug *r_debug;
+    struct r_debug_dlmopen *next;
+  };
 /* This is the instance of that structure used by the dynamic linker.  */
 extern struct r_debug _r_debug;
+extern struct r_debug_dlmopen _r_debug_dlmopen;
 
 /* This symbol refers to the "dynamic structure" in the `.dynamic' section
    of whatever module refers to `_DYNAMIC'.  So, to find its own
diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h
index ba114ab4b1..d9794bc7a0 100644
--- a/sysdeps/generic/ldsodefs.h
+++ b/sysdeps/generic/ldsodefs.h
@@ -357,6 +357,7 @@ struct rtld_global
     } _ns_unique_sym_table;
     /* Keep track of changes to each namespace' list.  */
     struct r_debug _ns_debug;
+    struct r_debug_dlmopen _ns_debug_dlmopen;
   } _dl_ns[DL_NNS];
   /* One higher than index of last used namespace.  */
   EXTERN size_t _dl_nns;
Carlos O'Donell July 23, 2020, 9:20 p.m. UTC | #6
On 7/23/20 2:40 PM, Daniel Walker (danielwa) wrote:
> On Fri, Jun 26, 2020 at 05:17:17PM -0400, Carlos O'Donell wrote:
>> On 6/26/20 3:32 PM, Daniel Walker via Libc-alpha wrote:
>>> Cisco System, Inc. has a need to have dlmopen support in gdb, which
>>> required glibc changes. I think it was known when glibc implemented
>>> dlmopen that gdb would not work with it.
>>>
>>> Since 2015 Cisco has had these patches in our inventor to fix issues in
>>> glibc which prevented this type of gdb usage.
>>>
>>> This RFC is mainly to get guidance on this implementation. We have some
>>> individuals who have signed the copyright assignment for glibc, and we
>>> will submit these (or different patches) formally thru those channels if
>>> no one has issues with the implementation.
>>>
>>> Also included in this are a couple of fixes which went along with the
>>> original implementation.
>>>
>>> Please provide any comments you might have.
>>>
>>> Conan C Huang (3):
>>>   Segfault when dlopen with RTLD_GLOBAL in dlmopened library
>>>   glibc: dlopen RTLD_NOLOAD optimization
>>>   add r_debug multiple namespaces support
>>>
>>>  elf/dl-close.c |  7 ++++++-
>>>  elf/dl-debug.c | 13 ++++++++++---
>>>  elf/dl-open.c  |  8 +++++++-
>>>  elf/link.h     |  4 ++++
>>>  4 files changed, 27 insertions(+), 5 deletions(-)
>>>
>>
>> Thanks for looking at this. It is something the community would
>> absolutely like to see. I'll comment quickly to provide direction.
>>
>> Florian Weimer, Pedro Alves, and I were talking about this as
>> recently as April where we tried to agree to just adding a
>> _r_debug_dlmopen with a new ABI for the debugger to use.
>>
> 
> 
> Here's another RFC I suppose. It's basic code I've only compile tested. It's
> based on the comments, and the threads you provided. It just abstracts out the
> next link into another structure. Let me know if this is in the ballpark of the
> discussions.

I only looked over this briefly, but I think it's on the right track.

The point is to use *another* data symbol for the debugger to use to access
the link maps. Then the debugger can look for that and try to use that to
access a list of maps.

Your next step would be to export the symbol via Versions at the current
symbol node GLIBC_2.32 (soon to be GLIBC_2.33).

The harder part will be the debugger changes because you have to look for
_r_debug_dlmopen in preference to _r_debug, and they are different layouts,
and once you find _r_debug_dlmopen you have to be able to maintain the
lookup scope of the namespace you're in within the debugger.


> ---
>  elf/dl-debug.c             | 19 +++++++++++++++++--
>  elf/link.h                 |  6 ++++++
>  sysdeps/generic/ldsodefs.h |  1 +
>  3 files changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/elf/dl-debug.c b/elf/dl-debug.c
> index 4b3d3ad6ba..d0009744f8 100644
> --- a/elf/dl-debug.c
> +++ b/elf/dl-debug.c
> @@ -35,6 +35,7 @@ extern const int verify_link_map_members[(VERIFY_MEMBER (l_addr)
>     a statically-linked program there is no dynamic section for the debugger
>     to examine and it looks for this particular symbol name.  */
>  struct r_debug _r_debug;
> +struct r_debug_dlmopen _r_debug_dlmopen;
>  
>  
>  /* Initialize _r_debug if it has not already been done.  The argument is
> @@ -45,11 +46,22 @@ struct r_debug *
>  _dl_debug_initialize (ElfW(Addr) ldbase, Lmid_t ns)
>  {
>    struct r_debug *r;
> +  struct r_debug_dlmopen *r_ns, *rp_ns;
>  
>    if (ns == LM_ID_BASE)
> -    r = &_r_debug;
> +    {
> +      r = &_r_debug;
> +      r_ns = &_r_debug_dlmopen;
> +    }
>    else
> -    r = &GL(dl_ns)[ns]._ns_debug;
> +    {
> +      r = &GL(dl_ns)[ns]._ns_debug;
> +      r_ns = &GL(dl_ns)[ns]._ns_debug_dlmopen;
> +      rp_ns = &GL(dl_ns)[ns - 1]._ns_debug_dlmopen;
> +      rp_ns->next = r_ns;
> +      if (ns - 1 == LM_ID_BASE)
> +        _r_debug_dlmopen.next = r_ns;
> +    }
>  
>    if (r->r_map == NULL || ldbase != 0)
>      {
> @@ -58,6 +70,9 @@ _dl_debug_initialize (ElfW(Addr) ldbase, Lmid_t ns)
>        r->r_ldbase = ldbase ?: _r_debug.r_ldbase;
>        r->r_map = (void *) GL(dl_ns)[ns]._ns_loaded;
>        r->r_brk = (ElfW(Addr)) &_dl_debug_state;
> +      r_ns->r_debug = r;
> +      r_ns->next = NULL;
> +
>      }
>  
>    return r;
> diff --git a/elf/link.h b/elf/link.h
> index 0048ad5d4d..c81945b671 100644
> --- a/elf/link.h
> +++ b/elf/link.h
> @@ -63,8 +63,14 @@ struct r_debug
>      ElfW(Addr) r_ldbase;	/* Base address the linker is loaded at.  */
>    };
>  
> +struct r_debug_dlmopen
> +  {
> +    struct r_debug *r_debug;
> +    struct r_debug_dlmopen *next;
> +  };
>  /* This is the instance of that structure used by the dynamic linker.  */
>  extern struct r_debug _r_debug;
> +extern struct r_debug_dlmopen _r_debug_dlmopen;
>  
>  /* This symbol refers to the "dynamic structure" in the `.dynamic' section
>     of whatever module refers to `_DYNAMIC'.  So, to find its own
> diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h
> index ba114ab4b1..d9794bc7a0 100644
> --- a/sysdeps/generic/ldsodefs.h
> +++ b/sysdeps/generic/ldsodefs.h
> @@ -357,6 +357,7 @@ struct rtld_global
>      } _ns_unique_sym_table;
>      /* Keep track of changes to each namespace' list.  */
>      struct r_debug _ns_debug;
> +    struct r_debug_dlmopen _ns_debug_dlmopen;
>    } _dl_ns[DL_NNS];
>    /* One higher than index of last used namespace.  */
>    EXTERN size_t _dl_nns;
>