Message ID | 20200626193228.1953-1-danielwa@cisco.com |
---|---|
Headers | show |
Series | implement dlmopen hooks for gdb | expand |
On 6/26/20 3:32 PM, Daniel Walker via Libc-alpha wrote: > Cisco System, Inc. has a need to have dlmopen support in gdb, which > required glibc changes. I think it was known when glibc implemented > dlmopen that gdb would not work with it. > > Since 2015 Cisco has had these patches in our inventor to fix issues in > glibc which prevented this type of gdb usage. > > This RFC is mainly to get guidance on this implementation. We have some > individuals who have signed the copyright assignment for glibc, and we > will submit these (or different patches) formally thru those channels if > no one has issues with the implementation. > > Also included in this are a couple of fixes which went along with the > original implementation. > > Please provide any comments you might have. > > Conan C Huang (3): > Segfault when dlopen with RTLD_GLOBAL in dlmopened library > glibc: dlopen RTLD_NOLOAD optimization > add r_debug multiple namespaces support > > elf/dl-close.c | 7 ++++++- > elf/dl-debug.c | 13 ++++++++++--- > elf/dl-open.c | 8 +++++++- > elf/link.h | 4 ++++ > 4 files changed, 27 insertions(+), 5 deletions(-) > Thanks for looking at this. It is something the community would absolutely like to see. I'll comment quickly to provide direction. Florian Weimer, Pedro Alves, and I were talking about this as recently as April where we tried to agree to just adding a _r_debug_dlmopen with a new ABI for the debugger to use. Your proposed solution of bumping the version is unacceptable, and was last rejected by Roland McGrath. The problem is that when you bump the version the current It is easier from a backwards compatibility perspective to add a new _r_debug_dlmopen and use that instead. gdb checks for r_version != 1 and issues a warning, but keeps going: 6952 if (linux_read_memory (priv->r_debug + lmo->r_version_offset, 6953 (unsigned char *) &r_version, 6954 sizeof (r_version)) != 0 6955 || r_version != 1) ^^^^^^^^^^^^^^ 6956 { 6957 warning ("unexpected r_debug version %d", r_version); 6958 } This is bad precedent that other software might have hard checks for r_version != 1 stop operating correclty. I suggest reviewing these threads: https://sourceware.org/legacy-ml/libc-alpha/2012-11/msg00182.html https://sourceware.org/legacy-ml/libc-alpha/2012-12/msg00278.html https://sourceware.org/legacy-ml/libc-alpha/2013-01/msg00045.html An alternative suggested in 2012 was to add a new DT_* entry to point to the extended debug information e.g. DT_DEBUG_EXTENDED, and so avoid needing ld.so for lookup of _r_debug_dlmopen. Gary Benson also suggests versioning the new structure, but being very clear what a "version bump" means, in that we compatible add elements to the end after each version change. So all consumers would check _r_debug_dlmopen.r_version > 1 to know they had at least v1 elements. And for reference from Solaris: https://docs.oracle.com/cd/E23824_01/html/819-0690/chapter6-1247.html#chapter6-15 I'd want to avoid having to run code to get at these objects, since experience has shown this is always going to cause problems. Having an entirely data-driven approach would be preferable, but locks us into an ABI that we have to be able to bump.
* Daniel Walker via Libc-alpha: > Also included in this are a couple of fixes which went along with the > original implementation. Have you seen crashes as the result of dlopen or dlsym failures in secondary namespaces? Thanks, Florian
On Fri, Jun 26, 2020 at 11:30:12PM +0200, Florian Weimer wrote: > * Daniel Walker via Libc-alpha: > > > Also included in this are a couple of fixes which went along with the > > original implementation. > > Have you seen crashes as the result of dlopen or dlsym failures in > secondary namespaces? ++ Conan Conan has done the most work on this, and I think he's working on it in terms of the product usage. I neglected to include him on this cover email. I've added him to this email hopefully he can respond. Daniel
The only crash we saw was the promotion of RTLD_GLOBAL flag in secondary namespace. Apart from that we didn't notice any other crashes or dlsym failures. However, we did noticed a design limitation with static TLS. Where shared objects with static TLS can quickly use up static TLS block reserved by the loader. This usually isn't a problem since only a few core libraries have static TLS and they are not dlopened. However, during each dlmopen these core libraries like libc are loaded and its static TLS uses up valuable space in static TLS block. Resulting to: libc.so.6: cannot allocate memory in static TLS block We are currently looking at how this can be enhanced. Maybe you guys already have discussions around this issue. On 2020-06-26, 9:11 PM, "Daniel Walker (danielwa)" <danielwa@cisco.com> wrote: On Fri, Jun 26, 2020 at 11:30:12PM +0200, Florian Weimer wrote: > * Daniel Walker via Libc-alpha: > > > Also included in this are a couple of fixes which went along with the > > original implementation. > > Have you seen crashes as the result of dlopen or dlsym failures in > secondary namespaces? ++ Conan Conan has done the most work on this, and I think he's working on it in terms of the product usage. I neglected to include him on this cover email. I've added him to this email hopefully he can respond. Daniel
On Fri, Jun 26, 2020 at 05:17:17PM -0400, Carlos O'Donell wrote: > On 6/26/20 3:32 PM, Daniel Walker via Libc-alpha wrote: > > Cisco System, Inc. has a need to have dlmopen support in gdb, which > > required glibc changes. I think it was known when glibc implemented > > dlmopen that gdb would not work with it. > > > > Since 2015 Cisco has had these patches in our inventor to fix issues in > > glibc which prevented this type of gdb usage. > > > > This RFC is mainly to get guidance on this implementation. We have some > > individuals who have signed the copyright assignment for glibc, and we > > will submit these (or different patches) formally thru those channels if > > no one has issues with the implementation. > > > > Also included in this are a couple of fixes which went along with the > > original implementation. > > > > Please provide any comments you might have. > > > > Conan C Huang (3): > > Segfault when dlopen with RTLD_GLOBAL in dlmopened library > > glibc: dlopen RTLD_NOLOAD optimization > > add r_debug multiple namespaces support > > > > elf/dl-close.c | 7 ++++++- > > elf/dl-debug.c | 13 ++++++++++--- > > elf/dl-open.c | 8 +++++++- > > elf/link.h | 4 ++++ > > 4 files changed, 27 insertions(+), 5 deletions(-) > > > > Thanks for looking at this. It is something the community would > absolutely like to see. I'll comment quickly to provide direction. > > Florian Weimer, Pedro Alves, and I were talking about this as > recently as April where we tried to agree to just adding a > _r_debug_dlmopen with a new ABI for the debugger to use. > Here's another RFC I suppose. It's basic code I've only compile tested. It's based on the comments, and the threads you provided. It just abstracts out the next link into another structure. Let me know if this is in the ballpark of the discussions. --- elf/dl-debug.c | 19 +++++++++++++++++-- elf/link.h | 6 ++++++ sysdeps/generic/ldsodefs.h | 1 + 3 files changed, 24 insertions(+), 2 deletions(-) diff --git a/elf/dl-debug.c b/elf/dl-debug.c index 4b3d3ad6ba..d0009744f8 100644 --- a/elf/dl-debug.c +++ b/elf/dl-debug.c @@ -35,6 +35,7 @@ extern const int verify_link_map_members[(VERIFY_MEMBER (l_addr) a statically-linked program there is no dynamic section for the debugger to examine and it looks for this particular symbol name. */ struct r_debug _r_debug; +struct r_debug_dlmopen _r_debug_dlmopen; /* Initialize _r_debug if it has not already been done. The argument is @@ -45,11 +46,22 @@ struct r_debug * _dl_debug_initialize (ElfW(Addr) ldbase, Lmid_t ns) { struct r_debug *r; + struct r_debug_dlmopen *r_ns, *rp_ns; if (ns == LM_ID_BASE) - r = &_r_debug; + { + r = &_r_debug; + r_ns = &_r_debug_dlmopen; + } else - r = &GL(dl_ns)[ns]._ns_debug; + { + r = &GL(dl_ns)[ns]._ns_debug; + r_ns = &GL(dl_ns)[ns]._ns_debug_dlmopen; + rp_ns = &GL(dl_ns)[ns - 1]._ns_debug_dlmopen; + rp_ns->next = r_ns; + if (ns - 1 == LM_ID_BASE) + _r_debug_dlmopen.next = r_ns; + } if (r->r_map == NULL || ldbase != 0) { @@ -58,6 +70,9 @@ _dl_debug_initialize (ElfW(Addr) ldbase, Lmid_t ns) r->r_ldbase = ldbase ?: _r_debug.r_ldbase; r->r_map = (void *) GL(dl_ns)[ns]._ns_loaded; r->r_brk = (ElfW(Addr)) &_dl_debug_state; + r_ns->r_debug = r; + r_ns->next = NULL; + } return r; diff --git a/elf/link.h b/elf/link.h index 0048ad5d4d..c81945b671 100644 --- a/elf/link.h +++ b/elf/link.h @@ -63,8 +63,14 @@ struct r_debug ElfW(Addr) r_ldbase; /* Base address the linker is loaded at. */ }; +struct r_debug_dlmopen + { + struct r_debug *r_debug; + struct r_debug_dlmopen *next; + }; /* This is the instance of that structure used by the dynamic linker. */ extern struct r_debug _r_debug; +extern struct r_debug_dlmopen _r_debug_dlmopen; /* This symbol refers to the "dynamic structure" in the `.dynamic' section of whatever module refers to `_DYNAMIC'. So, to find its own diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h index ba114ab4b1..d9794bc7a0 100644 --- a/sysdeps/generic/ldsodefs.h +++ b/sysdeps/generic/ldsodefs.h @@ -357,6 +357,7 @@ struct rtld_global } _ns_unique_sym_table; /* Keep track of changes to each namespace' list. */ struct r_debug _ns_debug; + struct r_debug_dlmopen _ns_debug_dlmopen; } _dl_ns[DL_NNS]; /* One higher than index of last used namespace. */ EXTERN size_t _dl_nns;
On 7/23/20 2:40 PM, Daniel Walker (danielwa) wrote: > On Fri, Jun 26, 2020 at 05:17:17PM -0400, Carlos O'Donell wrote: >> On 6/26/20 3:32 PM, Daniel Walker via Libc-alpha wrote: >>> Cisco System, Inc. has a need to have dlmopen support in gdb, which >>> required glibc changes. I think it was known when glibc implemented >>> dlmopen that gdb would not work with it. >>> >>> Since 2015 Cisco has had these patches in our inventor to fix issues in >>> glibc which prevented this type of gdb usage. >>> >>> This RFC is mainly to get guidance on this implementation. We have some >>> individuals who have signed the copyright assignment for glibc, and we >>> will submit these (or different patches) formally thru those channels if >>> no one has issues with the implementation. >>> >>> Also included in this are a couple of fixes which went along with the >>> original implementation. >>> >>> Please provide any comments you might have. >>> >>> Conan C Huang (3): >>> Segfault when dlopen with RTLD_GLOBAL in dlmopened library >>> glibc: dlopen RTLD_NOLOAD optimization >>> add r_debug multiple namespaces support >>> >>> elf/dl-close.c | 7 ++++++- >>> elf/dl-debug.c | 13 ++++++++++--- >>> elf/dl-open.c | 8 +++++++- >>> elf/link.h | 4 ++++ >>> 4 files changed, 27 insertions(+), 5 deletions(-) >>> >> >> Thanks for looking at this. It is something the community would >> absolutely like to see. I'll comment quickly to provide direction. >> >> Florian Weimer, Pedro Alves, and I were talking about this as >> recently as April where we tried to agree to just adding a >> _r_debug_dlmopen with a new ABI for the debugger to use. >> > > > Here's another RFC I suppose. It's basic code I've only compile tested. It's > based on the comments, and the threads you provided. It just abstracts out the > next link into another structure. Let me know if this is in the ballpark of the > discussions. I only looked over this briefly, but I think it's on the right track. The point is to use *another* data symbol for the debugger to use to access the link maps. Then the debugger can look for that and try to use that to access a list of maps. Your next step would be to export the symbol via Versions at the current symbol node GLIBC_2.32 (soon to be GLIBC_2.33). The harder part will be the debugger changes because you have to look for _r_debug_dlmopen in preference to _r_debug, and they are different layouts, and once you find _r_debug_dlmopen you have to be able to maintain the lookup scope of the namespace you're in within the debugger. > --- > elf/dl-debug.c | 19 +++++++++++++++++-- > elf/link.h | 6 ++++++ > sysdeps/generic/ldsodefs.h | 1 + > 3 files changed, 24 insertions(+), 2 deletions(-) > > diff --git a/elf/dl-debug.c b/elf/dl-debug.c > index 4b3d3ad6ba..d0009744f8 100644 > --- a/elf/dl-debug.c > +++ b/elf/dl-debug.c > @@ -35,6 +35,7 @@ extern const int verify_link_map_members[(VERIFY_MEMBER (l_addr) > a statically-linked program there is no dynamic section for the debugger > to examine and it looks for this particular symbol name. */ > struct r_debug _r_debug; > +struct r_debug_dlmopen _r_debug_dlmopen; > > > /* Initialize _r_debug if it has not already been done. The argument is > @@ -45,11 +46,22 @@ struct r_debug * > _dl_debug_initialize (ElfW(Addr) ldbase, Lmid_t ns) > { > struct r_debug *r; > + struct r_debug_dlmopen *r_ns, *rp_ns; > > if (ns == LM_ID_BASE) > - r = &_r_debug; > + { > + r = &_r_debug; > + r_ns = &_r_debug_dlmopen; > + } > else > - r = &GL(dl_ns)[ns]._ns_debug; > + { > + r = &GL(dl_ns)[ns]._ns_debug; > + r_ns = &GL(dl_ns)[ns]._ns_debug_dlmopen; > + rp_ns = &GL(dl_ns)[ns - 1]._ns_debug_dlmopen; > + rp_ns->next = r_ns; > + if (ns - 1 == LM_ID_BASE) > + _r_debug_dlmopen.next = r_ns; > + } > > if (r->r_map == NULL || ldbase != 0) > { > @@ -58,6 +70,9 @@ _dl_debug_initialize (ElfW(Addr) ldbase, Lmid_t ns) > r->r_ldbase = ldbase ?: _r_debug.r_ldbase; > r->r_map = (void *) GL(dl_ns)[ns]._ns_loaded; > r->r_brk = (ElfW(Addr)) &_dl_debug_state; > + r_ns->r_debug = r; > + r_ns->next = NULL; > + > } > > return r; > diff --git a/elf/link.h b/elf/link.h > index 0048ad5d4d..c81945b671 100644 > --- a/elf/link.h > +++ b/elf/link.h > @@ -63,8 +63,14 @@ struct r_debug > ElfW(Addr) r_ldbase; /* Base address the linker is loaded at. */ > }; > > +struct r_debug_dlmopen > + { > + struct r_debug *r_debug; > + struct r_debug_dlmopen *next; > + }; > /* This is the instance of that structure used by the dynamic linker. */ > extern struct r_debug _r_debug; > +extern struct r_debug_dlmopen _r_debug_dlmopen; > > /* This symbol refers to the "dynamic structure" in the `.dynamic' section > of whatever module refers to `_DYNAMIC'. So, to find its own > diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h > index ba114ab4b1..d9794bc7a0 100644 > --- a/sysdeps/generic/ldsodefs.h > +++ b/sysdeps/generic/ldsodefs.h > @@ -357,6 +357,7 @@ struct rtld_global > } _ns_unique_sym_table; > /* Keep track of changes to each namespace' list. */ > struct r_debug _ns_debug; > + struct r_debug_dlmopen _ns_debug_dlmopen; > } _dl_ns[DL_NNS]; > /* One higher than index of last used namespace. */ > EXTERN size_t _dl_nns; >
On Thu, Jul 23, 2020 at 05:20:23PM -0400, Carlos O'Donell wrote: > On 7/23/20 2:40 PM, Daniel Walker (danielwa) wrote: > > On Fri, Jun 26, 2020 at 05:17:17PM -0400, Carlos O'Donell wrote: > >> On 6/26/20 3:32 PM, Daniel Walker via Libc-alpha wrote: > >>> Cisco System, Inc. has a need to have dlmopen support in gdb, which > >>> required glibc changes. I think it was known when glibc implemented > >>> dlmopen that gdb would not work with it. > >>> > >>> Since 2015 Cisco has had these patches in our inventor to fix issues in > >>> glibc which prevented this type of gdb usage. > >>> > >>> This RFC is mainly to get guidance on this implementation. We have some > >>> individuals who have signed the copyright assignment for glibc, and we > >>> will submit these (or different patches) formally thru those channels if > >>> no one has issues with the implementation. > >>> > >>> Also included in this are a couple of fixes which went along with the > >>> original implementation. > >>> > >>> Please provide any comments you might have. > >>> > >>> Conan C Huang (3): > >>> Segfault when dlopen with RTLD_GLOBAL in dlmopened library > >>> glibc: dlopen RTLD_NOLOAD optimization > >>> add r_debug multiple namespaces support > >>> > >>> elf/dl-close.c | 7 ++++++- > >>> elf/dl-debug.c | 13 ++++++++++--- > >>> elf/dl-open.c | 8 +++++++- > >>> elf/link.h | 4 ++++ > >>> 4 files changed, 27 insertions(+), 5 deletions(-) > >>> > >> > >> Thanks for looking at this. It is something the community would > >> absolutely like to see. I'll comment quickly to provide direction. > >> > >> Florian Weimer, Pedro Alves, and I were talking about this as > >> recently as April where we tried to agree to just adding a > >> _r_debug_dlmopen with a new ABI for the debugger to use. > >> > > > > > > Here's another RFC I suppose. It's basic code I've only compile tested. It's > > based on the comments, and the threads you provided. It just abstracts out the > > next link into another structure. Let me know if this is in the ballpark of the > > discussions. > > I only looked over this briefly, but I think it's on the right track. > > The point is to use *another* data symbol for the debugger to use to access > the link maps. Then the debugger can look for that and try to use that to > access a list of maps. > > Your next step would be to export the symbol via Versions at the current > symbol node GLIBC_2.32 (soon to be GLIBC_2.33). > > The harder part will be the debugger changes because you have to look for > _r_debug_dlmopen in preference to _r_debug, and they are different layouts, > and once you find _r_debug_dlmopen you have to be able to maintain the > lookup scope of the namespace you're in within the debugger. > The second structure seems to work except making it available to GDB. I would guess there are suggestions for this from you or this list. A couple ideas, 1) GDB does pointer arithmetic off the r_debug DT_DEBUG value to find the r_debug_dlmopen structure. Add a linker script into glibc to force the two structures arrangement in memory, or use a section tag. 2) Add another dynamic linker entry to go along with DT_DEBUG like DT_DEBUG_DLMOPEN. Any other ideas for this ? Thanks, Daniel
On 9/16/20 12:18 PM, Daniel Walker (danielwa) wrote: > On Thu, Jul 23, 2020 at 05:20:23PM -0400, Carlos O'Donell wrote: >> On 7/23/20 2:40 PM, Daniel Walker (danielwa) wrote: >>> On Fri, Jun 26, 2020 at 05:17:17PM -0400, Carlos O'Donell wrote: >>>> On 6/26/20 3:32 PM, Daniel Walker via Libc-alpha wrote: >>>>> Cisco System, Inc. has a need to have dlmopen support in gdb, which >>>>> required glibc changes. I think it was known when glibc implemented >>>>> dlmopen that gdb would not work with it. >>>>> >>>>> Since 2015 Cisco has had these patches in our inventor to fix issues in >>>>> glibc which prevented this type of gdb usage. >>>>> >>>>> This RFC is mainly to get guidance on this implementation. We have some >>>>> individuals who have signed the copyright assignment for glibc, and we >>>>> will submit these (or different patches) formally thru those channels if >>>>> no one has issues with the implementation. >>>>> >>>>> Also included in this are a couple of fixes which went along with the >>>>> original implementation. >>>>> >>>>> Please provide any comments you might have. >>>>> >>>>> Conan C Huang (3): >>>>> Segfault when dlopen with RTLD_GLOBAL in dlmopened library >>>>> glibc: dlopen RTLD_NOLOAD optimization >>>>> add r_debug multiple namespaces support >>>>> >>>>> elf/dl-close.c | 7 ++++++- >>>>> elf/dl-debug.c | 13 ++++++++++--- >>>>> elf/dl-open.c | 8 +++++++- >>>>> elf/link.h | 4 ++++ >>>>> 4 files changed, 27 insertions(+), 5 deletions(-) >>>>> >>>> >>>> Thanks for looking at this. It is something the community would >>>> absolutely like to see. I'll comment quickly to provide direction. >>>> >>>> Florian Weimer, Pedro Alves, and I were talking about this as >>>> recently as April where we tried to agree to just adding a >>>> _r_debug_dlmopen with a new ABI for the debugger to use. >>>> >>> >>> >>> Here's another RFC I suppose. It's basic code I've only compile tested. It's >>> based on the comments, and the threads you provided. It just abstracts out the >>> next link into another structure. Let me know if this is in the ballpark of the >>> discussions. >> >> I only looked over this briefly, but I think it's on the right track. >> >> The point is to use *another* data symbol for the debugger to use to access >> the link maps. Then the debugger can look for that and try to use that to >> access a list of maps. >> >> Your next step would be to export the symbol via Versions at the current >> symbol node GLIBC_2.32 (soon to be GLIBC_2.33). >> >> The harder part will be the debugger changes because you have to look for >> _r_debug_dlmopen in preference to _r_debug, and they are different layouts, >> and once you find _r_debug_dlmopen you have to be able to maintain the >> lookup scope of the namespace you're in within the debugger. >> > > > The second structure seems to work except making it available to GDB. I would > guess there are suggestions for this from you or this list. > > A couple ideas, > > 1) GDB does pointer arithmetic off the r_debug DT_DEBUG value to find the > r_debug_dlmopen structure. Add a linker script into glibc to force the two > structures arrangement in memory, or use a section tag. In gdbserver I see that it's using DT_DEBUG exclusively to find _r_debug. in gdb/solib-svr4.c: 798 /* Find DT_DEBUG. */ 799 if (scan_dyntag (DT_DEBUG, exec_bfd, &dyn_ptr, NULL) 800 || scan_dyntag_auxv (DT_DEBUG, &dyn_ptr, NULL)) 801 return dyn_ptr; 802 803 /* This may be a static executable. Look for the symbol 804 conventionally named _r_debug, as a last resort. */ 805 msymbol = lookup_minimal_symbol ("_r_debug", NULL, symfile_objfile); 806 if (msymbol.minsym != NULL) 807 return BMSYMBOL_VALUE_ADDRESS (msymbol); This code makes the most sense to me. You look for DT_DEBUG otherwise lookup _r_debug (which is _r_debug@@GLIBC_2.2.5 on x86_64). I would say that finding _r_debug_dlmopen would require lookup up the symbol, not as a last resort, but as a definition of the API. You will always have .dynsym with a definition for _r_debug_dlmopen. > 2) Add another dynamic linker entry to go along with DT_DEBUG like > DT_DEBUG_DLMOPEN. This is one way which avoids hard coding _r_debug_dlmopen and instead puts it into a DT_* tag, but requires we add a new tag. I have no strong opinion here. Having the tag avoids going through the symbol lookup, so it could have good value. In gdbserver/linux-low.cc we have get_r_debug which doesn't do anything but looking at DT_DEBUG. This would need changing to to lookup _r_debug_dlmopen in that area, or DT_DEBUG_DLMOPEN. However, looking at my i686/x86_64 system I don't see DT_DEBUG being set so I don't know how this works with gdbserver? I could have sworn we were using DT_DEBUG on x86... if we don't then we should fix that, but that's another bug.
* Carlos O'Donell:
> You will always have .dynsym with a definition for _r_debug_dlmopen.
Note that this doesn't work if you just have a core file. In order to
find _r_debug (or _r_debug_dlmopen), a debugger needs the exact same
copy of ld.so that was used by the executable, otherwise the symbol
cannot be found in the image.
Thanks,
Florian
On 9/17/20 8:52 AM, Carlos O'Donell wrote: > However, looking at my i686/x86_64 system I don't see DT_DEBUG being > set so I don't know how this works with gdbserver? I could have sworn > we were using DT_DEBUG on x86... if we don't then we should fix that, > but that's another bug. I looked at the wrong thing. We *are* creating a DT_DEBUG entry. The loader fills DT_DEBUG with &_r_debug at runtime. Core files have DT_DEBUG with the runtime &_r_debug value. This means core file can avoid needing to look things up in ld.so.
On 9/17/20 8:59 AM, Florian Weimer wrote: > * Carlos O'Donell: > >> You will always have .dynsym with a definition for _r_debug_dlmopen. > > Note that this doesn't work if you just have a core file. In order to > find _r_debug (or _r_debug_dlmopen), a debugger needs the exact same > copy of ld.so that was used by the executable, otherwise the symbol > cannot be found in the image. You are correct. I followed up on my own email regarding this. So in the end to get process and core file debugging we'll need: * _r_debug_dlmopen * DT_DEBUG_DLMOPEN
On Thu, Sep 17, 2020 at 09:53:30AM -0400, Carlos O'Donell wrote: > On 9/17/20 8:59 AM, Florian Weimer wrote: > > * Carlos O'Donell: > > > >> You will always have .dynsym with a definition for _r_debug_dlmopen. > > > > Note that this doesn't work if you just have a core file. In order to > > find _r_debug (or _r_debug_dlmopen), a debugger needs the exact same > > copy of ld.so that was used by the executable, otherwise the symbol > > cannot be found in the image. > > You are correct. > > I followed up on my own email regarding this. > > So in the end to get process and core file debugging we'll need: > > * _r_debug_dlmopen > * DT_DEBUG_DLMOPEN > It seems like adding DT_DEBUG_DLMOPEN into the gABI might take some effort. Have you considered this ? The last one which was added was DT_SYMTAB_SHNDX in 2018, and it looks like it did not come from glibc. Daniel
* Daniel Walker: > On Thu, Sep 17, 2020 at 09:53:30AM -0400, Carlos O'Donell wrote: >> On 9/17/20 8:59 AM, Florian Weimer wrote: >> > * Carlos O'Donell: >> > >> >> You will always have .dynsym with a definition for _r_debug_dlmopen. >> > >> > Note that this doesn't work if you just have a core file. In order to >> > find _r_debug (or _r_debug_dlmopen), a debugger needs the exact same >> > copy of ld.so that was used by the executable, otherwise the symbol >> > cannot be found in the image. >> >> You are correct. >> >> I followed up on my own email regarding this. >> >> So in the end to get process and core file debugging we'll need: >> >> * _r_debug_dlmopen >> * DT_DEBUG_DLMOPEN >> > > It seems like adding DT_DEBUG_DLMOPEN into the gABI might take some > effort. Have you considered this ? The last one which was added was > DT_SYMTAB_SHNDX in 2018, and it looks like it did not come from glibc. We are reviving GNU gABI maintenance. There's been quite a bit of list activity, and a proposal of a first ABI document: <https://sourceware.org/pipermail/gnu-gabi/2020q3/thread.html> I have a feeling that we might be soon over this bump, and getting things added should become easier. In the meantime, can we demo this feature without DT_DEBUG_DLMOPEN? With a patch glibc and gdb? Incidentally, I have an LD_AUDIT issue I need to debug. 8-) Thanks, Florian
On Fri, Sep 18, 2020 at 05:40:30PM +0200, Florian Weimer wrote: > * Daniel Walker: > > > On Thu, Sep 17, 2020 at 09:53:30AM -0400, Carlos O'Donell wrote: > >> On 9/17/20 8:59 AM, Florian Weimer wrote: > >> > * Carlos O'Donell: > >> > > >> >> You will always have .dynsym with a definition for _r_debug_dlmopen. > >> > > >> > Note that this doesn't work if you just have a core file. In order to > >> > find _r_debug (or _r_debug_dlmopen), a debugger needs the exact same > >> > copy of ld.so that was used by the executable, otherwise the symbol > >> > cannot be found in the image. > >> > >> You are correct. > >> > >> I followed up on my own email regarding this. > >> > >> So in the end to get process and core file debugging we'll need: > >> > >> * _r_debug_dlmopen > >> * DT_DEBUG_DLMOPEN > >> > > > > It seems like adding DT_DEBUG_DLMOPEN into the gABI might take some > > effort. Have you considered this ? The last one which was added was > > DT_SYMTAB_SHNDX in 2018, and it looks like it did not come from glibc. > > We are reviving GNU gABI maintenance. There's been quite a bit of list > activity, and a proposal of a first ABI document: > > <https://sourceware.org/pipermail/gnu-gabi/2020q3/thread.html> > > I have a feeling that we might be soon over this bump, and getting > things added should become easier. > > In the meantime, can we demo this feature without DT_DEBUG_DLMOPEN? > With a patch glibc and gdb? Incidentally, I have an LD_AUDIT issue I > need to debug. 8-) > The only fully working version we have is the one I released originally. Yes, that version had no DT_DEBUG_DLMOPEN. It should be working and you can demo it. We're still working on updating GDB to use the new interfaces. In terms of updating the gABI, should I just add a patch to glibc to add values or do I need special documents to be submitted ? Daniel
* Carlos O'Donell via Libc-alpha: > Your next step would be to export the symbol via Versions at the current > symbol node GLIBC_2.32 (soon to be GLIBC_2.33). Can we create a new GLIBC_DEBUG symbol versions for symbols which are not intended to be used for run-time linking? The idea is that consumers will have deal with the absence of these symbols anyway, so we just need one symbol version that does not depend on the glibc version for this. Dependency management considerations (that apply to symbols with run-time linking) do not come into play here. Thanks, Florian
On 9/22/20 1:06 PM, Florian Weimer wrote: > * Carlos O'Donell via Libc-alpha: > >> Your next step would be to export the symbol via Versions at the current >> symbol node GLIBC_2.32 (soon to be GLIBC_2.33). > > Can we create a new GLIBC_DEBUG symbol versions for symbols which are > not intended to be used for run-time linking? > > The idea is that consumers will have deal with the absence of these > symbols anyway, so we just need one symbol version that does not depend > on the glibc version for this. Dependency management considerations > (that apply to symbols with run-time linking) do not come into play here. I don't object to GLIBC_DEBUG, like GLIBC_PRIVATE it can be considered a transient ABI that is valid only for a major release?
* Carlos O'Donell: > On 9/22/20 1:06 PM, Florian Weimer wrote: >> * Carlos O'Donell via Libc-alpha: >> >>> Your next step would be to export the symbol via Versions at the current >>> symbol node GLIBC_2.32 (soon to be GLIBC_2.33). >> >> Can we create a new GLIBC_DEBUG symbol versions for symbols which are >> not intended to be used for run-time linking? >> >> The idea is that consumers will have deal with the absence of these >> symbols anyway, so we just need one symbol version that does not depend >> on the glibc version for this. Dependency management considerations >> (that apply to symbols with run-time linking) do not come into play here. > > I don't object to GLIBC_DEBUG, like GLIBC_PRIVATE it can be considered > a transient ABI that is valid only for a major release? No, unlike GLIBC_PRIVATE, you can assume that if a GLIBC_DEBUG symbol is there (and perhaps has the documented size), it has the documented semantics. But you can't assume that it is present. The semantics of GLIBC_PRIVATE symbols can change arbitrarily, even between builds. Thanks, Florian
On 9/22/20 1:37 PM, Florian Weimer wrote: > * Carlos O'Donell: > >> On 9/22/20 1:06 PM, Florian Weimer wrote: >>> * Carlos O'Donell via Libc-alpha: >>> >>>> Your next step would be to export the symbol via Versions at the current >>>> symbol node GLIBC_2.32 (soon to be GLIBC_2.33). >>> >>> Can we create a new GLIBC_DEBUG symbol versions for symbols which are >>> not intended to be used for run-time linking? >>> >>> The idea is that consumers will have deal with the absence of these >>> symbols anyway, so we just need one symbol version that does not depend >>> on the glibc version for this. Dependency management considerations >>> (that apply to symbols with run-time linking) do not come into play here. >> >> I don't object to GLIBC_DEBUG, like GLIBC_PRIVATE it can be considered >> a transient ABI that is valid only for a major release? > > No, unlike GLIBC_PRIVATE, you can assume that if a GLIBC_DEBUG symbol is > there (and perhaps has the documented size), it has the documented > semantics. But you can't assume that it is present. > > The semantics of GLIBC_PRIVATE symbols can change arbitrarily, even > between builds. Yes, absolutely, I agree completely, for it to be useful the semantics have to be: - If you detect a given symbol foo@GLIBC_DEBUG, then the feature is present and has the semantics you expect. - If you want new semantics then you need to make a foo2@GLIBC_DEBUG with the new semantics. What are the runtime semantics of the symbol? How do you access it?
* Carlos O'Donell: >> No, unlike GLIBC_PRIVATE, you can assume that if a GLIBC_DEBUG symbol is >> there (and perhaps has the documented size), it has the documented >> semantics. But you can't assume that it is present. >> >> The semantics of GLIBC_PRIVATE symbols can change arbitrarily, even >> between builds. > > Yes, absolutely, I agree completely, for it to be useful the semantics > have to be: > > - If you detect a given symbol foo@GLIBC_DEBUG, then the feature is > present and has the semantics you expect. > > - If you want new semantics then you need to make a foo2@GLIBC_DEBUG > with the new semantics. > > What are the runtime semantics of the symbol? How do you access it? That obviously depends on the symbol? Sorry, I don't quite understand these questions. Thanks, Florian
On Sep 22 2020, Carlos O'Donell via Libc-alpha wrote: > Yes, absolutely, I agree completely, for it to be useful the semantics > have to be: > > - If you detect a given symbol foo@GLIBC_DEBUG, then the feature is > present and has the semantics you expect. > > - If you want new semantics then you need to make a foo2@GLIBC_DEBUG > with the new semantics. > > What are the runtime semantics of the symbol? How do you access it? Isn't that the same situation as libthread_db? Andreas.
On 9/22/20 2:04 PM, Florian Weimer wrote: > * Carlos O'Donell: > >>> No, unlike GLIBC_PRIVATE, you can assume that if a GLIBC_DEBUG symbol is >>> there (and perhaps has the documented size), it has the documented >>> semantics. But you can't assume that it is present. >>> >>> The semantics of GLIBC_PRIVATE symbols can change arbitrarily, even >>> between builds. >> >> Yes, absolutely, I agree completely, for it to be useful the semantics >> have to be: >> >> - If you detect a given symbol foo@GLIBC_DEBUG, then the feature is >> present and has the semantics you expect. >> >> - If you want new semantics then you need to make a foo2@GLIBC_DEBUG >> with the new semantics. >> >> What are the runtime semantics of the symbol? How do you access it? > > That obviously depends on the symbol? Sorry, I don't quite understand > these questions. You noted "not intended to be used for run-time linking?" Could you expand on what you're thinking there?
* Carlos O'Donell: > On 9/22/20 2:04 PM, Florian Weimer wrote: >> * Carlos O'Donell: >> >>>> No, unlike GLIBC_PRIVATE, you can assume that if a GLIBC_DEBUG symbol is >>>> there (and perhaps has the documented size), it has the documented >>>> semantics. But you can't assume that it is present. >>>> >>>> The semantics of GLIBC_PRIVATE symbols can change arbitrarily, even >>>> between builds. >>> >>> Yes, absolutely, I agree completely, for it to be useful the semantics >>> have to be: >>> >>> - If you detect a given symbol foo@GLIBC_DEBUG, then the feature is >>> present and has the semantics you expect. >>> >>> - If you want new semantics then you need to make a foo2@GLIBC_DEBUG >>> with the new semantics. >>> >>> What are the runtime semantics of the symbol? How do you access it? >> >> That obviously depends on the symbol? Sorry, I don't quite understand >> these questions. > > You noted "not intended to be used for run-time linking?" > > Could you expand on what you're thinking there? If there are no versioned dependencies on the symbol at the ELF level, then the issues that require some distributions to backport whole symbol sets do not apply. The exact contents of the GLIBC_DEBUG symbol set does not matter than. Thanks, Florian
On 9/22/20 2:44 PM, Florian Weimer wrote: > * Carlos O'Donell: > >> On 9/22/20 2:04 PM, Florian Weimer wrote: >>> * Carlos O'Donell: >>> >>>>> No, unlike GLIBC_PRIVATE, you can assume that if a GLIBC_DEBUG symbol is >>>>> there (and perhaps has the documented size), it has the documented >>>>> semantics. But you can't assume that it is present. >>>>> >>>>> The semantics of GLIBC_PRIVATE symbols can change arbitrarily, even >>>>> between builds. >>>> >>>> Yes, absolutely, I agree completely, for it to be useful the semantics >>>> have to be: >>>> >>>> - If you detect a given symbol foo@GLIBC_DEBUG, then the feature is >>>> present and has the semantics you expect. >>>> >>>> - If you want new semantics then you need to make a foo2@GLIBC_DEBUG >>>> with the new semantics. >>>> >>>> What are the runtime semantics of the symbol? How do you access it? >>> >>> That obviously depends on the symbol? Sorry, I don't quite understand >>> these questions. >> >> You noted "not intended to be used for run-time linking?" >> >> Could you expand on what you're thinking there? > > If there are no versioned dependencies on the symbol at the ELF level, > then the issues that require some distributions to backport whole symbol > sets do not apply. The exact contents of the GLIBC_DEBUG symbol set > does not matter than. Thank you for the clarification. I agree completely.
On 9/22/20 2:17 PM, Andreas Schwab wrote: > On Sep 22 2020, Carlos O'Donell via Libc-alpha wrote: > >> Yes, absolutely, I agree completely, for it to be useful the semantics >> have to be: >> >> - If you detect a given symbol foo@GLIBC_DEBUG, then the feature is >> present and has the semantics you expect. >> >> - If you want new semantics then you need to make a foo2@GLIBC_DEBUG >> with the new semantics. >> >> What are the runtime semantics of the symbol? How do you access it? > > Isn't that the same situation as libthread_db? Yes, but coupled to libc.so, and doesn't require finding and loading another matching library. Taking that direction would mean creating a symbol in libthread_db. In this particular case the symbol would provide the address of the new structure that you could walk that contains the namespace lists (that themselves contain linkmap lists). In my opinion we should be heading towards the complete removal of libthread_db from glibc because as an interface it requires that the debugger load a library from a potentially untrusted filesystem (or container) and execute code in order to debug the process. I would rather see data-driven approaches where foo@GLIBC_DEBUG is a data symbol and exposes a structure that can be walked to gather information about the inferior. It is also difficult if not impossible for a kernel-side agent to run target code from libthread_db to resolve the result. Keeping the symbol in libc.so avoids any debugger having to locate the matching libthread_db, which is not always in the same place as the library. In summary: - Use data symbols. - Avoid needing to run code to resolve result. - Keeps interface matched and in libc.so.
On Fri, Sep 18, 2020 at 05:40:30PM +0200, Florian Weimer wrote: > * Daniel Walker: > > > On Thu, Sep 17, 2020 at 09:53:30AM -0400, Carlos O'Donell wrote: > >> On 9/17/20 8:59 AM, Florian Weimer wrote: > >> > * Carlos O'Donell: > >> > > >> >> You will always have .dynsym with a definition for _r_debug_dlmopen. > >> > > >> > Note that this doesn't work if you just have a core file. In order to > >> > find _r_debug (or _r_debug_dlmopen), a debugger needs the exact same > >> > copy of ld.so that was used by the executable, otherwise the symbol > >> > cannot be found in the image. > >> > >> You are correct. > >> > >> I followed up on my own email regarding this. > >> > >> So in the end to get process and core file debugging we'll need: > >> > >> * _r_debug_dlmopen > >> * DT_DEBUG_DLMOPEN > >> > > > > It seems like adding DT_DEBUG_DLMOPEN into the gABI might take some > > effort. Have you considered this ? The last one which was added was > > DT_SYMTAB_SHNDX in 2018, and it looks like it did not come from glibc. > > We are reviving GNU gABI maintenance. There's been quite a bit of list > activity, and a proposal of a first ABI document: > > <https://sourceware.org/pipermail/gnu-gabi/2020q3/thread.html> > > I have a feeling that we might be soon over this bump, and getting > things added should become easier. > > In the meantime, can we demo this feature without DT_DEBUG_DLMOPEN? > With a patch glibc and gdb? Incidentally, I have an LD_AUDIT issue I > need to debug. 8-) Do you know what the status of taking over the gABI is ? Daniel
On Wed, Jul 28, 2021 at 11:34 AM Daniel Walker via Libc-alpha <libc-alpha@sourceware.org> wrote: > > On Fri, Sep 18, 2020 at 05:40:30PM +0200, Florian Weimer wrote: > > * Daniel Walker: > > > > > On Thu, Sep 17, 2020 at 09:53:30AM -0400, Carlos O'Donell wrote: > > >> On 9/17/20 8:59 AM, Florian Weimer wrote: > > >> > * Carlos O'Donell: > > >> > > > >> >> You will always have .dynsym with a definition for _r_debug_dlmopen. > > >> > > > >> > Note that this doesn't work if you just have a core file. In order to > > >> > find _r_debug (or _r_debug_dlmopen), a debugger needs the exact same > > >> > copy of ld.so that was used by the executable, otherwise the symbol > > >> > cannot be found in the image. > > >> > > >> You are correct. > > >> > > >> I followed up on my own email regarding this. > > >> > > >> So in the end to get process and core file debugging we'll need: > > >> > > >> * _r_debug_dlmopen > > >> * DT_DEBUG_DLMOPEN > > >> > > > > > > It seems like adding DT_DEBUG_DLMOPEN into the gABI might take some > > > effort. Have you considered this ? The last one which was added was > > > DT_SYMTAB_SHNDX in 2018, and it looks like it did not come from glibc. > > > > We are reviving GNU gABI maintenance. There's been quite a bit of list > > activity, and a proposal of a first ABI document: > > > > <https://sourceware.org/pipermail/gnu-gabi/2020q3/thread.html> > > > > I have a feeling that we might be soon over this bump, and getting > > things added should become easier. > > > > In the meantime, can we demo this feature without DT_DEBUG_DLMOPEN? > > With a patch glibc and gdb? Incidentally, I have an LD_AUDIT issue I > > need to debug. 8-) > > Do you know what the status of taking over the gABI is ? Cary made an announcement last August: https://groups.google.com/g/generic-abi/c/9OO5vhxb00Y/m/D-PCPis_CAAJ But I didn't see the github URL.