Message ID | CAPESumqMqVem6VvaKXf_ko1zpM9_wOXixp1O7WGtS0RnvhMSpg@mail.gmail.com |
---|---|
State | New |
Headers | show |
On 04/26/2016 11:58 PM, d wk wrote: > Hello libc developers, > > In a project of mine, I needed to run some code before any constructors > from any system libraries (such as libc or libpthread). The linker/loader > -z initfirst feature is perfect for this, but it only supports one shared > library. Unfortunately libpthread also uses this feature (I assume the > feature exists because pthread needed it), so my project was incompatible > with libpthread. > > So, I wrote a small patch which changes the single dl_initfirst variable > into a linked list. This patch does not change the size of any data > structures (it's ABI compatible), just turns dl_initfirst into a list. The > list is not freed (the allocator wouldn't free it anyway), and insertion > into the list is quadratic, but I expect there will never be more than > a handful of initfirst libraries! > > This patch records initfirst libraries in load order, so LD_PRELOAD > libraries will have their constructors called before libpthread. If the > opposite behaviour is desired, the LD_PRELOAD'd library can always declare > a dependency on libpthread. Normally LD_PRELOAD constructors are run last, > which is very inconvenient when trying to inject new functionality, and > I expect anyone using -z initfirst with LD_PRELOAD to really want to run > first. The patch is written against latest glibc 2.23 (I also tested on > glibc 2.21, and it's not quite compatible with 2.17 since the other data > structures changed). > > I was not the first person to run into this problem, > someone wanted the same thing on stack overflow two years > ago. You can see my answer there with a complete test case. > http://stackoverflow.com/questions/19796383/linux-ld-preload-z-initfirst/36143861 > > Hope you will accept this patch. Comments welcome. Thanks, Hi, I think that many debugging/profiling tools would want this feature (e.g. for AddressSanitizer https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56393#c9). But here are few questions you may want to consider: * is your solution compatible with DF_1_INITFIRST behavior on Solaris? * should DF_1_INITFIRST also influence destruction order? * what if initfirst library has some dependencies e.g. it needs malloc from Glibc or dlsym from libdl.so during construction (that's e.g. AddressSanitizer's case)? The current logic of initfirst is rather primitive as it does not track such dependencies at all. -Y > ~ dwk. > > > ----[ cut here ]---- > Support -z initfirst for multiple shared libraries (run in load order). > > This is particularly useful when combined with LD_PRELOAD, as it is then > possible to run constructors before any code in other libraries runs. > --- > elf/dl-init.c | 9 ++++++++- > elf/dl-load.c | 19 ++++++++++++++++++- > elf/dl-support.c | 4 ++-- > sysdeps/generic/ldsodefs.h | 7 +++++-- > 4 files changed, 33 insertions(+), 6 deletions(-) > > diff --git a/elf/dl-init.c b/elf/dl-init.c > index 818c3aa..da59d1f 100644 > --- a/elf/dl-init.c > +++ b/elf/dl-init.c > @@ -84,7 +84,14 @@ _dl_init (struct link_map *main_map, int argc, char > **argv, char **env) > > if (__glibc_unlikely (GL(dl_initfirst) != NULL)) > { > - call_init (GL(dl_initfirst), argc, argv, env); > + struct initfirst_list *initfirst; > + for(initfirst = GL(dl_initfirst); initfirst; initfirst = initfirst->next) > + { > + call_init (initfirst->which, argc, argv, env); > + } > + > + /* We do not try to free this list, as the memory will not be reclaimed > + by the allocator unless there were no intervening malloc()'s. */ > GL(dl_initfirst) = NULL; > } > > diff --git a/elf/dl-load.c b/elf/dl-load.c > index c0d6249..1efabbf 100644 > --- a/elf/dl-load.c > +++ b/elf/dl-load.c > @@ -1388,7 +1388,24 @@ cannot enable executable stack as shared object > requires"); > > /* Remember whether this object must be initialized first. */ > if (l->l_flags_1 & DF_1_INITFIRST) > - GL(dl_initfirst) = l; > + { > + struct initfirst_list *new_node = malloc(sizeof(*node)); > + struct initfirst_list *it = GL(dl_initfirst); > + new_node->which = l; > + new_node->next = NULL; > + > + /* We append to the end of the linked list. Whichever library was loaded > + first has higher initfirst priority. This means that LD_PRELOAD > + initfirst overrides initfirst in libraries linked normally. */ > + if (!it) > + GL(dl_initfirst) = new_node; > + else > + { > + while (it->next) > + it = it->next; > + it->next = new_node; > + } > + } > > /* Finally the file information. */ > l->l_file_id = id; > diff --git a/elf/dl-support.c b/elf/dl-support.c > index c30194c..d8b8acc 100644 > --- a/elf/dl-support.c > +++ b/elf/dl-support.c > @@ -147,8 +147,8 @@ struct r_search_path_elem *_dl_all_dirs; > /* All directories after startup. */ > struct r_search_path_elem *_dl_init_all_dirs; > > -/* The object to be initialized first. */ > -struct link_map *_dl_initfirst; > +/* The list of objects to be initialized first. */ > +struct initfirst_list *_dl_initfirst; > > /* Descriptor to write debug messages to. */ > int _dl_debug_fd = STDERR_FILENO; > diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h > index ddec0be..198c089 100644 > --- a/sysdeps/generic/ldsodefs.h > +++ b/sysdeps/generic/ldsodefs.h > @@ -326,8 +326,11 @@ struct rtld_global > /* Incremented whenever something may have been added to dl_loaded. */ > EXTERN unsigned long long _dl_load_adds; > > - /* The object to be initialized first. */ > - EXTERN struct link_map *_dl_initfirst; > + /* The list of objects to be initialized first. */ > + EXTERN struct initfirst_list { > + struct link_map *which; > + struct initfirst_list *next; > + } *_dl_initfirst; > > #if HP_SMALL_TIMING_AVAIL > /* Start time on CPU clock. */ >
Good points! On Wed, Apr 27, 2016 at 3:03 AM, Yury Gribov <y.gribov@samsung.com> wrote: > On 04/26/2016 11:58 PM, d wk wrote: >> >> Hello libc developers, >> >> In a project of mine, I needed to run some code before any constructors >> from any system libraries (such as libc or libpthread). The linker/loader >> -z initfirst feature is perfect for this, but it only supports one shared >> library. Unfortunately libpthread also uses this feature (I assume the >> feature exists because pthread needed it), so my project was incompatible >> with libpthread. >> >> So, I wrote a small patch which changes the single dl_initfirst variable >> into a linked list. This patch does not change the size of any data >> structures (it's ABI compatible), just turns dl_initfirst into a list. The >> list is not freed (the allocator wouldn't free it anyway), and insertion >> into the list is quadratic, but I expect there will never be more than >> a handful of initfirst libraries! >> >> This patch records initfirst libraries in load order, so LD_PRELOAD >> libraries will have their constructors called before libpthread. If the >> opposite behaviour is desired, the LD_PRELOAD'd library can always declare >> a dependency on libpthread. Normally LD_PRELOAD constructors are run last, >> which is very inconvenient when trying to inject new functionality, and >> I expect anyone using -z initfirst with LD_PRELOAD to really want to run >> first. The patch is written against latest glibc 2.23 (I also tested on >> glibc 2.21, and it's not quite compatible with 2.17 since the other data >> structures changed). >> >> I was not the first person to run into this problem, >> someone wanted the same thing on stack overflow two years >> ago. You can see my answer there with a complete test case. >> >> http://stackoverflow.com/questions/19796383/linux-ld-preload-z-initfirst/36143861 >> >> Hope you will accept this patch. Comments welcome. Thanks, > > > Hi, > > I think that many debugging/profiling tools would want this feature (e.g. > for AddressSanitizer https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56393#c9). > But here are few questions you may want to consider: > * is your solution compatible with DF_1_INITFIRST behavior on Solaris? From reading the Solaris ld.so.1(1) man page, they say that initfirst "marks the object so that its runtime initialization occurs before the runtime initialization of any other objects brought into the process at the same time". Here LD_PRELOAD objects are arguably brought into the process earlier, as described by their ld(1) man page. So we ought to ensure that LD_PRELOAD initfirst libraries are initialized before all other LD_PRELOAD libraries, and also that normal initfirst libraries are initialized before all other normal libraries. Actually we will run LD_PRELOAD initfirst, then normal initfirst, then normal constructors, then normal LD_PRELOAD constructors (running these last is the default behaviour without initfirst). So although it's a bit complicated, I think the behaviour of this patch is compatible with Solaris. > * should DF_1_INITFIRST also influence destruction order? Solaris does: "object runtime finalization will occur after the runtime finalization of any other objects removed from the process at the same time". glibc's previous initfirst did not do this (I guess pthread didn't need a destructor). In general I think this is much less important, it's usually only used for proper cleanup. The destructor code in dl-fini.c also looks more complicated to adapt, but I can try if this is deemed important. It seems like an orthogonal issue to me. > * what if initfirst library has some dependencies e.g. it needs malloc from > Glibc or dlsym from libdl.so during construction (that's e.g. > AddressSanitizer's case)? The current logic of initfirst is rather primitive > as it does not track such dependencies at all. Unfortunately, libpthread depends on libc -- yet it uses initfirst to get initialized before libc. In a way, we cannot satisfy the constraints for initfirst (to paraphrase Solaris, an initfirst library is initialized before the initialization of other libraries present at load-time) and also allow the initfirst library to have dependencies like this. It's a contradiction and it just makes the loading process less deterministic. The developers just have to make sure that the constructors do not call any functions from libraries that haven't been initialized yet (or call functions that don't care about initialization). In my own system, I needed libc functionality. What I did was write a minimal library which had -z initfirst, and reimplement malloc, read, write, and whatever else I needed. This library would pass off its data structures to another shared library, which really was depending on libc and got initialized later. The user would write LD_PRELOAD=libstage1.so:libstage2.so. My code had the requirement that it had a constructor called very early, and another constructor called late, however. In the simpler case where the debugging/profiling tool developer needs to run some code early, then some code later which depends on libc (but doesn't need constructing), it can be done from within a single library. As libpthread currently is doing. (I didn't try this, but maybe it could be arranged that calling malloc() before libc is initialized uses the loader's own watermark allocator? The loader itself has a similar dilemma, of course, and it uses its own malloc until libc's becomes available...) libdl is kind of a special case because it is so closely tied to the loader. In my system, I ended up parsing the ELF headers from loaded libraries to look up symbols. It's fairly simple to reproduce what the loader is doing and walk its data structure to find load addresses. I think, again, the best way to handle an initfirst library's dependency on libdl would be to expose the loader's symbol map so that the library could call loader functions if it really wanted to. A lot of libdl's functionality (like dlopen'ing new libraries) just gets confusing at initfirst time. If we really wanted to honour these dependencies, we certainly could. I'm just not sure it's what tool developers want. -dwk. > > -Y > > >> ~ dwk. >> >> >> ----[ cut here ]---- >> Support -z initfirst for multiple shared libraries (run in load order). >> >> This is particularly useful when combined with LD_PRELOAD, as it is then >> possible to run constructors before any code in other libraries runs. >> --- >> elf/dl-init.c | 9 ++++++++- >> elf/dl-load.c | 19 ++++++++++++++++++- >> elf/dl-support.c | 4 ++-- >> sysdeps/generic/ldsodefs.h | 7 +++++-- >> 4 files changed, 33 insertions(+), 6 deletions(-) >> >> diff --git a/elf/dl-init.c b/elf/dl-init.c >> index 818c3aa..da59d1f 100644 >> --- a/elf/dl-init.c >> +++ b/elf/dl-init.c >> @@ -84,7 +84,14 @@ _dl_init (struct link_map *main_map, int argc, char >> **argv, char **env) >> >> if (__glibc_unlikely (GL(dl_initfirst) != NULL)) >> { >> - call_init (GL(dl_initfirst), argc, argv, env); >> + struct initfirst_list *initfirst; >> + for(initfirst = GL(dl_initfirst); initfirst; initfirst = >> initfirst->next) >> + { >> + call_init (initfirst->which, argc, argv, env); >> + } >> + >> + /* We do not try to free this list, as the memory will not be >> reclaimed >> + by the allocator unless there were no intervening malloc()'s. >> */ >> GL(dl_initfirst) = NULL; >> } >> >> diff --git a/elf/dl-load.c b/elf/dl-load.c >> index c0d6249..1efabbf 100644 >> --- a/elf/dl-load.c >> +++ b/elf/dl-load.c >> @@ -1388,7 +1388,24 @@ cannot enable executable stack as shared object >> requires"); >> >> /* Remember whether this object must be initialized first. */ >> if (l->l_flags_1 & DF_1_INITFIRST) >> - GL(dl_initfirst) = l; >> + { >> + struct initfirst_list *new_node = malloc(sizeof(*node)); >> + struct initfirst_list *it = GL(dl_initfirst); >> + new_node->which = l; >> + new_node->next = NULL; >> + >> + /* We append to the end of the linked list. Whichever library was >> loaded >> + first has higher initfirst priority. This means that LD_PRELOAD >> + initfirst overrides initfirst in libraries linked normally. */ >> + if (!it) >> + GL(dl_initfirst) = new_node; >> + else >> + { >> + while (it->next) >> + it = it->next; >> + it->next = new_node; >> + } >> + } >> >> /* Finally the file information. */ >> l->l_file_id = id; >> diff --git a/elf/dl-support.c b/elf/dl-support.c >> index c30194c..d8b8acc 100644 >> --- a/elf/dl-support.c >> +++ b/elf/dl-support.c >> @@ -147,8 +147,8 @@ struct r_search_path_elem *_dl_all_dirs; >> /* All directories after startup. */ >> struct r_search_path_elem *_dl_init_all_dirs; >> >> -/* The object to be initialized first. */ >> -struct link_map *_dl_initfirst; >> +/* The list of objects to be initialized first. */ >> +struct initfirst_list *_dl_initfirst; >> >> /* Descriptor to write debug messages to. */ >> int _dl_debug_fd = STDERR_FILENO; >> diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h >> index ddec0be..198c089 100644 >> --- a/sysdeps/generic/ldsodefs.h >> +++ b/sysdeps/generic/ldsodefs.h >> @@ -326,8 +326,11 @@ struct rtld_global >> /* Incremented whenever something may have been added to dl_loaded. >> */ >> EXTERN unsigned long long _dl_load_adds; >> >> - /* The object to be initialized first. */ >> - EXTERN struct link_map *_dl_initfirst; >> + /* The list of objects to be initialized first. */ >> + EXTERN struct initfirst_list { >> + struct link_map *which; >> + struct initfirst_list *next; >> + } *_dl_initfirst; >> >> #if HP_SMALL_TIMING_AVAIL >> /* Start time on CPU clock. */ >> >
On 04/27/2016 04:57 PM, d wk wrote: > Good points! > > On Wed, Apr 27, 2016 at 3:03 AM, Yury Gribov <y.gribov@samsung.com> wrote: >> On 04/26/2016 11:58 PM, d wk wrote: >>> >>> Hello libc developers, >>> >>> In a project of mine, I needed to run some code before any constructors >>> from any system libraries (such as libc or libpthread). The linker/loader >>> -z initfirst feature is perfect for this, but it only supports one shared >>> library. Unfortunately libpthread also uses this feature (I assume the >>> feature exists because pthread needed it), so my project was incompatible >>> with libpthread. >>> >>> So, I wrote a small patch which changes the single dl_initfirst variable >>> into a linked list. This patch does not change the size of any data >>> structures (it's ABI compatible), just turns dl_initfirst into a list. The >>> list is not freed (the allocator wouldn't free it anyway), and insertion >>> into the list is quadratic, but I expect there will never be more than >>> a handful of initfirst libraries! >>> >>> This patch records initfirst libraries in load order, so LD_PRELOAD >>> libraries will have their constructors called before libpthread. If the >>> opposite behaviour is desired, the LD_PRELOAD'd library can always declare >>> a dependency on libpthread. Normally LD_PRELOAD constructors are run last, >>> which is very inconvenient when trying to inject new functionality, and >>> I expect anyone using -z initfirst with LD_PRELOAD to really want to run >>> first. The patch is written against latest glibc 2.23 (I also tested on >>> glibc 2.21, and it's not quite compatible with 2.17 since the other data >>> structures changed). >>> >>> I was not the first person to run into this problem, >>> someone wanted the same thing on stack overflow two years >>> ago. You can see my answer there with a complete test case. >>> >>> http://stackoverflow.com/questions/19796383/linux-ld-preload-z-initfirst/36143861 >>> >>> Hope you will accept this patch. Comments welcome. Thanks, >> >> >> Hi, >> >> I think that many debugging/profiling tools would want this feature (e.g. >> for AddressSanitizer https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56393#c9). >> But here are few questions you may want to consider: >> * is your solution compatible with DF_1_INITFIRST behavior on Solaris? > >>From reading the Solaris ld.so.1(1) man page, they say that initfirst > "marks the object so that its runtime initialization occurs before the > runtime initialization of any other objects brought into the process at > the same time". Here LD_PRELOAD objects are arguably brought into the > process earlier, as described by their ld(1) man page. So we ought to > ensure that LD_PRELOAD initfirst libraries are initialized before all > other LD_PRELOAD libraries, and also that normal initfirst libraries > are initialized before all other normal libraries. Actually we will run > LD_PRELOAD initfirst, then normal initfirst, then normal constructors, > then normal LD_PRELOAD constructors (running these last is the default > behaviour without initfirst). So although it's a bit complicated, I think > the behaviour of this patch is compatible with Solaris. Agreed. >> * should DF_1_INITFIRST also influence destruction order? > > Solaris does: "object runtime finalization will occur after the runtime > finalization of any other objects removed from the process at the same > time". glibc's previous initfirst did not do this (I guess pthread didn't > need a destructor). In general I think this is much less important, it's > usually only used for proper cleanup. The destructor code in dl-fini.c > also looks more complicated to adapt, but I can try if this is deemed > important. It seems like an orthogonal issue to me. > > >> * what if initfirst library has some dependencies e.g. it needs malloc from >> Glibc or dlsym from libdl.so during construction (that's e.g. >> AddressSanitizer's case)? The current logic of initfirst is rather primitive >> as it does not track such dependencies at all. > > Unfortunately, libpthread depends on libc -- yet it uses initfirst to get > initialized before libc. Yeah. It seems that initfirst is a crude hack which bypasses all dependency tracking. I wonder if there's a place for another, hopefully saner, dependency-respecting flag. > In a way, we cannot satisfy the constraints for > initfirst (to paraphrase Solaris, an initfirst library is initialized > before the initialization of other libraries present at load-time) and > also allow the initfirst library to have dependencies like this. It's a > contradiction and it just makes the loading process less deterministic. The > developers just have to make sure that the constructors do not call > any functions from libraries that haven't been initialized yet (or call > functions that don't care about initialization). > > In my own system, I needed libc functionality. What I did was write > a minimal library which had -z initfirst, and reimplement malloc, > read, write, and whatever else I needed. That's possible approach but requiring all tools developers to do the same seems like an overkill as they'll typically need to reimplement good part of IO, getenv(), ELF symtab parser and (primitive) memory allocator. There seems to be no way around that given the current primitive DF_1_INITFIRST semantics, so I wonder if a better approach would be to throw in a completely different dynamic flag for more precise control over library initialization order. > This library would pass > off its data structures to another shared library, which really was > depending on libc and got initialized later. The user would write > LD_PRELOAD=libstage1.so:libstage2.so. My code had the requirement that it > had a constructor called very early, and another constructor called late, > however. In the simpler case where the debugging/profiling tool developer > needs to run some code early, then some code later which depends on libc > (but doesn't need constructing), it can be done from within a single > library. As libpthread currently is doing. > > (I didn't try this, but maybe it could be arranged that calling malloc() > before libc is initialized uses the loader's own watermark allocator? The > loader itself has a similar dilemma, of course, and it uses its own malloc > until libc's becomes available...) > > libdl is kind of a special case because it is so closely tied to the > loader. In my system, I ended up parsing the ELF headers from loaded > libraries to look up symbols. It's fairly simple to reproduce what the > loader is doing and walk its data structure to find load addresses. I > think, again, the best way to handle an initfirst library's dependency > on libdl would be to expose the loader's symbol map so that the library > could call loader functions if it really wanted to. A lot of libdl's > functionality (like dlopen'ing new libraries) just gets confusing at > initfirst time. > > If we really wanted to honour these dependencies, we certainly could. I'm > just not sure it's what tool developers want. I'm myself pretty sure that people would generally prefer to avoid reimplementing parts of Glibc (symbol resolver in particular). Let's see if Kostya has something to say. > -dwk. > >> >> -Y >> >> >>> ~ dwk. >>> >>> >>> ----[ cut here ]---- >>> Support -z initfirst for multiple shared libraries (run in load order). >>> >>> This is particularly useful when combined with LD_PRELOAD, as it is then >>> possible to run constructors before any code in other libraries runs. >>> --- >>> elf/dl-init.c | 9 ++++++++- >>> elf/dl-load.c | 19 ++++++++++++++++++- >>> elf/dl-support.c | 4 ++-- >>> sysdeps/generic/ldsodefs.h | 7 +++++-- >>> 4 files changed, 33 insertions(+), 6 deletions(-) >>> >>> diff --git a/elf/dl-init.c b/elf/dl-init.c >>> index 818c3aa..da59d1f 100644 >>> --- a/elf/dl-init.c >>> +++ b/elf/dl-init.c >>> @@ -84,7 +84,14 @@ _dl_init (struct link_map *main_map, int argc, char >>> **argv, char **env) >>> >>> if (__glibc_unlikely (GL(dl_initfirst) != NULL)) >>> { >>> - call_init (GL(dl_initfirst), argc, argv, env); >>> + struct initfirst_list *initfirst; >>> + for(initfirst = GL(dl_initfirst); initfirst; initfirst = >>> initfirst->next) >>> + { >>> + call_init (initfirst->which, argc, argv, env); >>> + } >>> + >>> + /* We do not try to free this list, as the memory will not be >>> reclaimed >>> + by the allocator unless there were no intervening malloc()'s. >>> */ >>> GL(dl_initfirst) = NULL; >>> } >>> >>> diff --git a/elf/dl-load.c b/elf/dl-load.c >>> index c0d6249..1efabbf 100644 >>> --- a/elf/dl-load.c >>> +++ b/elf/dl-load.c >>> @@ -1388,7 +1388,24 @@ cannot enable executable stack as shared object >>> requires"); >>> >>> /* Remember whether this object must be initialized first. */ >>> if (l->l_flags_1 & DF_1_INITFIRST) >>> - GL(dl_initfirst) = l; >>> + { >>> + struct initfirst_list *new_node = malloc(sizeof(*node)); >>> + struct initfirst_list *it = GL(dl_initfirst); >>> + new_node->which = l; >>> + new_node->next = NULL; >>> + >>> + /* We append to the end of the linked list. Whichever library was >>> loaded >>> + first has higher initfirst priority. This means that LD_PRELOAD >>> + initfirst overrides initfirst in libraries linked normally. */ >>> + if (!it) >>> + GL(dl_initfirst) = new_node; >>> + else >>> + { >>> + while (it->next) >>> + it = it->next; >>> + it->next = new_node; >>> + } >>> + } >>> >>> /* Finally the file information. */ >>> l->l_file_id = id; >>> diff --git a/elf/dl-support.c b/elf/dl-support.c >>> index c30194c..d8b8acc 100644 >>> --- a/elf/dl-support.c >>> +++ b/elf/dl-support.c >>> @@ -147,8 +147,8 @@ struct r_search_path_elem *_dl_all_dirs; >>> /* All directories after startup. */ >>> struct r_search_path_elem *_dl_init_all_dirs; >>> >>> -/* The object to be initialized first. */ >>> -struct link_map *_dl_initfirst; >>> +/* The list of objects to be initialized first. */ >>> +struct initfirst_list *_dl_initfirst; >>> >>> /* Descriptor to write debug messages to. */ >>> int _dl_debug_fd = STDERR_FILENO; >>> diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h >>> index ddec0be..198c089 100644 >>> --- a/sysdeps/generic/ldsodefs.h >>> +++ b/sysdeps/generic/ldsodefs.h >>> @@ -326,8 +326,11 @@ struct rtld_global >>> /* Incremented whenever something may have been added to dl_loaded. >>> */ >>> EXTERN unsigned long long _dl_load_adds; >>> >>> - /* The object to be initialized first. */ >>> - EXTERN struct link_map *_dl_initfirst; >>> + /* The list of objects to be initialized first. */ >>> + EXTERN struct initfirst_list { >>> + struct link_map *which; >>> + struct initfirst_list *next; >>> + } *_dl_initfirst; >>> >>> #if HP_SMALL_TIMING_AVAIL >>> /* Start time on CPU clock. */ >>> >> > >
On 04/26/2016 04:58 PM, d wk wrote: > In a project of mine, I needed to run some code before any constructors > from any system libraries (such as libc or libpthread). The linker/loader > -z initfirst feature is perfect for this, but it only supports one shared > library. Unfortunately libpthread also uses this feature (I assume the > feature exists because pthread needed it), so my project was incompatible > with libpthread. > > So, I wrote a small patch which changes the single dl_initfirst variable > into a linked list. This patch does not change the size of any data > structures (it's ABI compatible), just turns dl_initfirst into a list. The > list is not freed (the allocator wouldn't free it anyway), and insertion > into the list is quadratic, but I expect there will never be more than > a handful of initfirst libraries! > > This patch records initfirst libraries in load order, so LD_PRELOAD > libraries will have their constructors called before libpthread. If the > opposite behaviour is desired, the LD_PRELOAD'd library can always declare > a dependency on libpthread. Normally LD_PRELOAD constructors are run last, > which is very inconvenient when trying to inject new functionality, and > I expect anyone using -z initfirst with LD_PRELOAD to really want to run > first. The patch is written against latest glibc 2.23 (I also tested on > glibc 2.21, and it's not quite compatible with 2.17 since the other data > structures changed). > > I was not the first person to run into this problem, > someone wanted the same thing on stack overflow two years > ago. You can see my answer there with a complete test case. > http://stackoverflow.com/questions/19796383/linux-ld-preload-z-initfirst/36143861 > > Hope you will accept this patch. Comments welcome. Thanks, (1) High level review: I like the idea. At a high level I like the idea. We have one open bug for this here: Bug 14379 - shared object constructors are called in the wrong order https://sourceware.org/bugzilla/show_bug.cgi?id=14379 It is true that LD_PRELOAD'd libraries are loaded last, but only because of a quirk in the processing mechanics. The library is first in the scope search order (so the symbols interpose) which means it's initialized last, and finalized first after the application. To be honest I think this is a bug and LD_PRELOAD'd libraries should be initialized and finalized based on their dependencies. (2) DF_1_INITFIRST Implementing DF_1_INITFIRST is going to be complicated. It is complicated by the fact that dlopen could result in N libraries being loaded, and if M of those is DF_1_INITFIRST, then M must be initialized first. This is my interpretation of the Solaris implementation and the reason for the wording "at the same time." I can't look at your patch to review this though (copyright issues). (3) Finalization of DF_1_INITFIRST and initfirst libraries. Ignore DF_1_INITFIRST. Finalization for initfirst libraries has to happen after the finalization of all other libraries. This makes the behaviour deterministic and symmetric. (4) Copyright status? Please clarify your copyright status with the project: https://sourceware.org/glibc/wiki/Contribution%20checklist#FSF_copyright_Assignment I don't see your name on the copyright assignment documents from the FSF. The easiest form is: http://git.savannah.gnu.org/cgit/gnulib.git/plain/doc/Copyright/request-assign.future It assigns all current and future contributions to glibc to the FSF and allows us to accept them immediately. Cheers, Carlos.
On 04/27/2016 01:22 PM, Yury Gribov wrote: > Yeah. It seems that initfirst is a crude hack which bypasses all > dependency tracking. I wonder if there's a place for another, > hopefully saner, dependency-respecting flag. What use cases do you need to support? How would a dependency-respecting initfirst-like flag work given the conflicting requirements to initialize in dependency order and yet not initialize in dependency order? Today we have: - Library initializers and finalizers (Run in dep order) - Library constructors and destructors (Run in dep order) - Prioritized constructors and destructors (Run in dep order, and # order) - LD_PRELOAD initializer run last before the application is initialized. - Non-zero initialized data from .data in the ELF image. Dependency ordering also include symbol dependencies and relocation dependencies, sorting all objects into a linear list and breaking cycles where appropriate at deterministic points (though we need a better ldd to show these problems). It might be better if we had some kind of invariant assertions like "abort if I'm not initialized before library SONAME" that then allowed developers to realize their invariant is wrong and restructure the application. Alternatively if we had some better tooling that might help also (I'm working on an alternate eu-ldd and deterministic cycle breaking in ld.so, but it's a long way off).
On 28/04/16 16:27, Carlos O'Donell wrote: > On 04/27/2016 01:22 PM, Yury Gribov wrote: >> Yeah. It seems that initfirst is a crude hack which bypasses all >> dependency tracking. I wonder if there's a place for another, >> hopefully saner, dependency-respecting flag. > > What use cases do you need to support? > > How would a dependency-respecting initfirst-like flag work given the > conflicting requirements to initialize in dependency order and yet > not initialize in dependency order? > may be use a topological sort order where the flagged module comes earliest possible? > Today we have: > > - Library initializers and finalizers (Run in dep order) > - Library constructors and destructors (Run in dep order) > - Prioritized constructors and destructors (Run in dep order, and # order) > - LD_PRELOAD initializer run last before the application is initialized. > - Non-zero initialized data from .data in the ELF image. > > Dependency ordering also include symbol dependencies and relocation > dependencies, sorting all objects into a linear list and breaking if that was done correctly then i think LD_PRELOAD libs would come before the modules using the interposed symbols. > cycles where appropriate at deterministic points (though we need > a better ldd to show these problems). > > It might be better if we had some kind of invariant assertions like > "abort if I'm not initialized before library SONAME" that then allowed > developers to realize their invariant is wrong and restructure the > application. Alternatively if we had some better tooling that might > help also (I'm working on an alternate eu-ldd and deterministic > cycle breaking in ld.so, but it's a long way off). > depending on elf ctor ordering sounds broken to me. (incompatible with static linking) depending on deterministic ordering in the presence of cycles sounds even more nonsensical. (the dynamic linker shouldn't try to solve np hard problems.)
On 04/27/2016 08:31 PM, Kostya Serebryany wrote: > On Wed, Apr 27, 2016 at 10:22 AM, Yury Gribov <y.gribov@samsung.com> wrote: > >> On 04/27/2016 04:57 PM, d wk wrote: >> >>> Good points! >>> >>> On Wed, Apr 27, 2016 at 3:03 AM, Yury Gribov <y.gribov@samsung.com> >>> wrote: >>> >>>> On 04/26/2016 11:58 PM, d wk wrote: >>>> >>>>> >>>>> Hello libc developers, >>>>> >>>>> In a project of mine, I needed to run some code before any constructors >>>>> from any system libraries (such as libc or libpthread). The >>>>> linker/loader >>>>> -z initfirst feature is perfect for this, but it only supports one >>>>> shared >>>>> library. Unfortunately libpthread also uses this feature (I assume the >>>>> feature exists because pthread needed it), so my project was >>>>> incompatible >>>>> with libpthread. >>>>> >>>>> So, I wrote a small patch which changes the single dl_initfirst variable >>>>> into a linked list. This patch does not change the size of any data >>>>> structures (it's ABI compatible), just turns dl_initfirst into a list. >>>>> The >>>>> list is not freed (the allocator wouldn't free it anyway), and insertion >>>>> into the list is quadratic, but I expect there will never be more than >>>>> a handful of initfirst libraries! >>>>> >>>>> This patch records initfirst libraries in load order, so LD_PRELOAD >>>>> libraries will have their constructors called before libpthread. If the >>>>> opposite behaviour is desired, the LD_PRELOAD'd library can always >>>>> declare >>>>> a dependency on libpthread. Normally LD_PRELOAD constructors are run >>>>> last, >>>>> which is very inconvenient when trying to inject new functionality, and >>>>> I expect anyone using -z initfirst with LD_PRELOAD to really want to run >>>>> first. The patch is written against latest glibc 2.23 (I also tested on >>>>> glibc 2.21, and it's not quite compatible with 2.17 since the other data >>>>> structures changed). >>>>> >>>>> I was not the first person to run into this problem, >>>>> someone wanted the same thing on stack overflow two years >>>>> ago. You can see my answer there with a complete test case. >>>>> >>>>> >>>>> http://stackoverflow.com/questions/19796383/linux-ld-preload-z-initfirst/36143861 >>>>> >>>>> Hope you will accept this patch. Comments welcome. Thanks, >>>>> >>>> >>>> >>>> Hi, >>>> >>>> I think that many debugging/profiling tools would want this feature (e.g. >>>> for AddressSanitizer >>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56393#c9). >>>> But here are few questions you may want to consider: >>>> * is your solution compatible with DF_1_INITFIRST behavior on Solaris? >>>> >>> >>> From reading the Solaris ld.so.1(1) man page, they say that initfirst >>>> >>> "marks the object so that its runtime initialization occurs before the >>> runtime initialization of any other objects brought into the process at >>> the same time". Here LD_PRELOAD objects are arguably brought into the >>> process earlier, as described by their ld(1) man page. So we ought to >>> ensure that LD_PRELOAD initfirst libraries are initialized before all >>> other LD_PRELOAD libraries, and also that normal initfirst libraries >>> are initialized before all other normal libraries. Actually we will run >>> LD_PRELOAD initfirst, then normal initfirst, then normal constructors, >>> then normal LD_PRELOAD constructors (running these last is the default >>> behaviour without initfirst). So although it's a bit complicated, I think >>> the behaviour of this patch is compatible with Solaris. >>> >> >> Agreed. >> >> * should DF_1_INITFIRST also influence destruction order? >>>> >>> >>> Solaris does: "object runtime finalization will occur after the runtime >>> finalization of any other objects removed from the process at the same >>> time". glibc's previous initfirst did not do this (I guess pthread didn't >>> need a destructor). In general I think this is much less important, it's >>> usually only used for proper cleanup. The destructor code in dl-fini.c >>> also looks more complicated to adapt, but I can try if this is deemed >>> important. It seems like an orthogonal issue to me. >>> >>> >>> * what if initfirst library has some dependencies e.g. it needs malloc >>>> from >>>> Glibc or dlsym from libdl.so during construction (that's e.g. >>>> AddressSanitizer's case)? The current logic of initfirst is rather >>>> primitive >>>> as it does not track such dependencies at all. >>>> >>> >>> Unfortunately, libpthread depends on libc -- yet it uses initfirst to get >>> initialized before libc. >>> >> >> Yeah. It seems that initfirst is a crude hack which bypasses all >> dependency tracking. I wonder if there's a place for another, hopefully >> saner, dependency-respecting flag. >> >> In a way, we cannot satisfy the constraints for >>> initfirst (to paraphrase Solaris, an initfirst library is initialized >>> before the initialization of other libraries present at load-time) and >>> also allow the initfirst library to have dependencies like this. It's a >>> contradiction and it just makes the loading process less deterministic. >>> The >>> developers just have to make sure that the constructors do not call >>> any functions from libraries that haven't been initialized yet (or call >>> functions that don't care about initialization). >>> >>> In my own system, I needed libc functionality. What I did was write >>> a minimal library which had -z initfirst, and reimplement malloc, >>> read, write, and whatever else I needed. >>> >> >> That's possible approach but requiring all tools developers to do the same >> seems like an overkill as they'll typically need to reimplement good part >> of IO, getenv(), ELF symtab parser and (primitive) memory allocator. >> >> There seems to be no way around that given the current primitive >> DF_1_INITFIRST semantics, so I wonder if a better approach would be to >> throw in a completely different dynamic flag for more precise control over >> library initialization order. >> >> This library would pass >>> off its data structures to another shared library, which really was >>> depending on libc and got initialized later. The user would write >>> LD_PRELOAD=libstage1.so:libstage2.so. My code had the requirement that it >>> had a constructor called very early, and another constructor called late, >>> however. In the simpler case where the debugging/profiling tool developer >>> needs to run some code early, then some code later which depends on libc >>> (but doesn't need constructing), it can be done from within a single >>> library. As libpthread currently is doing. >>> >>> (I didn't try this, but maybe it could be arranged that calling malloc() >>> before libc is initialized uses the loader's own watermark allocator? The >>> loader itself has a similar dilemma, of course, and it uses its own malloc >>> until libc's becomes available...) >>> >>> libdl is kind of a special case because it is so closely tied to the >>> loader. In my system, I ended up parsing the ELF headers from loaded >>> libraries to look up symbols. It's fairly simple to reproduce what the >>> loader is doing and walk its data structure to find load addresses. I >>> think, again, the best way to handle an initfirst library's dependency >>> on libdl would be to expose the loader's symbol map so that the library >>> could call loader functions if it really wanted to. A lot of libdl's >>> functionality (like dlopen'ing new libraries) just gets confusing at >>> initfirst time. >>> >>> If we really wanted to honour these dependencies, we certainly could. I'm >>> just not sure it's what tool developers want. >>> >> >> I'm myself pretty sure that people would generally prefer to avoid >> reimplementing parts of Glibc (symbol resolver in particular). Let's see if >> Kostya has something to say. > > > Hm? To say about what? :) > We almost never use asan as a DSO on Linux, so we don't get any problems > like this. Just to clarify: you mean you don't ASan DSO dependencies (libc, libpthread, librt, etc.) to be initialized before ASan initialization (i.e. __asan_init) runs? In that case the OP's patch would work for ASan DSO. > > >> >> >> -dwk. >>> >>> >>>> -Y >>>> >>>> >>>> ~ dwk. >>>>> >>>>> >>>>> ----[ cut here ]---- >>>>> Support -z initfirst for multiple shared libraries (run in load order). >>>>> >>>>> This is particularly useful when combined with LD_PRELOAD, as it is then >>>>> possible to run constructors before any code in other libraries runs. >>>>> --- >>>>> elf/dl-init.c | 9 ++++++++- >>>>> elf/dl-load.c | 19 ++++++++++++++++++- >>>>> elf/dl-support.c | 4 ++-- >>>>> sysdeps/generic/ldsodefs.h | 7 +++++-- >>>>> 4 files changed, 33 insertions(+), 6 deletions(-) >>>>> >>>>> diff --git a/elf/dl-init.c b/elf/dl-init.c >>>>> index 818c3aa..da59d1f 100644 >>>>> --- a/elf/dl-init.c >>>>> +++ b/elf/dl-init.c >>>>> @@ -84,7 +84,14 @@ _dl_init (struct link_map *main_map, int argc, char >>>>> **argv, char **env) >>>>> >>>>> if (__glibc_unlikely (GL(dl_initfirst) != NULL)) >>>>> { >>>>> - call_init (GL(dl_initfirst), argc, argv, env); >>>>> + struct initfirst_list *initfirst; >>>>> + for(initfirst = GL(dl_initfirst); initfirst; initfirst = >>>>> initfirst->next) >>>>> + { >>>>> + call_init (initfirst->which, argc, argv, env); >>>>> + } >>>>> + >>>>> + /* We do not try to free this list, as the memory will not be >>>>> reclaimed >>>>> + by the allocator unless there were no intervening malloc()'s. >>>>> */ >>>>> GL(dl_initfirst) = NULL; >>>>> } >>>>> >>>>> diff --git a/elf/dl-load.c b/elf/dl-load.c >>>>> index c0d6249..1efabbf 100644 >>>>> --- a/elf/dl-load.c >>>>> +++ b/elf/dl-load.c >>>>> @@ -1388,7 +1388,24 @@ cannot enable executable stack as shared object >>>>> requires"); >>>>> >>>>> /* Remember whether this object must be initialized first. */ >>>>> if (l->l_flags_1 & DF_1_INITFIRST) >>>>> - GL(dl_initfirst) = l; >>>>> + { >>>>> + struct initfirst_list *new_node = malloc(sizeof(*node)); >>>>> + struct initfirst_list *it = GL(dl_initfirst); >>>>> + new_node->which = l; >>>>> + new_node->next = NULL; >>>>> + >>>>> + /* We append to the end of the linked list. Whichever library was >>>>> loaded >>>>> + first has higher initfirst priority. This means that >>>>> LD_PRELOAD >>>>> + initfirst overrides initfirst in libraries linked normally. >>>>> */ >>>>> + if (!it) >>>>> + GL(dl_initfirst) = new_node; >>>>> + else >>>>> + { >>>>> + while (it->next) >>>>> + it = it->next; >>>>> + it->next = new_node; >>>>> + } >>>>> + } >>>>> >>>>> /* Finally the file information. */ >>>>> l->l_file_id = id; >>>>> diff --git a/elf/dl-support.c b/elf/dl-support.c >>>>> index c30194c..d8b8acc 100644 >>>>> --- a/elf/dl-support.c >>>>> +++ b/elf/dl-support.c >>>>> @@ -147,8 +147,8 @@ struct r_search_path_elem *_dl_all_dirs; >>>>> /* All directories after startup. */ >>>>> struct r_search_path_elem *_dl_init_all_dirs; >>>>> >>>>> -/* The object to be initialized first. */ >>>>> -struct link_map *_dl_initfirst; >>>>> +/* The list of objects to be initialized first. */ >>>>> +struct initfirst_list *_dl_initfirst; >>>>> >>>>> /* Descriptor to write debug messages to. */ >>>>> int _dl_debug_fd = STDERR_FILENO; >>>>> diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h >>>>> index ddec0be..198c089 100644 >>>>> --- a/sysdeps/generic/ldsodefs.h >>>>> +++ b/sysdeps/generic/ldsodefs.h >>>>> @@ -326,8 +326,11 @@ struct rtld_global >>>>> /* Incremented whenever something may have been added to dl_loaded. >>>>> */ >>>>> EXTERN unsigned long long _dl_load_adds; >>>>> >>>>> - /* The object to be initialized first. */ >>>>> - EXTERN struct link_map *_dl_initfirst; >>>>> + /* The list of objects to be initialized first. */ >>>>> + EXTERN struct initfirst_list { >>>>> + struct link_map *which; >>>>> + struct initfirst_list *next; >>>>> + } *_dl_initfirst; >>>>> >>>>> #if HP_SMALL_TIMING_AVAIL >>>>> /* Start time on CPU clock. */ >>>>> >>>>> >>>> >>> >>> >> >
On 04/28/2016 06:54 PM, Szabolcs Nagy wrote: > On 28/04/16 16:27, Carlos O'Donell wrote: >> On 04/27/2016 01:22 PM, Yury Gribov wrote: >>> Yeah. It seems that initfirst is a crude hack which bypasses all >>> dependency tracking. I wonder if there's a place for another, >>> hopefully saner, dependency-respecting flag. >> >> What use cases do you need to support? >> >> How would a dependency-respecting initfirst-like flag work given the >> conflicting requirements to initialize in dependency order and yet >> not initialize in dependency order? >> > > may be use a topological sort order where the > flagged module comes earliest possible? That was my expectation as well. >> Today we have: >> >> - Library initializers and finalizers (Run in dep order) >> - Library constructors and destructors (Run in dep order) >> - Prioritized constructors and destructors (Run in dep order, and # order) >> - LD_PRELOAD initializer run last before the application is initialized. >> - Non-zero initialized data from .data in the ELF image. >> >> Dependency ordering also include symbol dependencies and relocation >> dependencies, sorting all objects into a linear list and breaking > > if that was done correctly then i think LD_PRELOAD > libs would come before the modules using the interposed > symbols. > >> cycles where appropriate at deterministic points (though we need >> a better ldd to show these problems). >> >> It might be better if we had some kind of invariant assertions like >> "abort if I'm not initialized before library SONAME" that then allowed >> developers to realize their invariant is wrong and restructure the >> application. Alternatively if we had some better tooling that might >> help also (I'm working on an alternate eu-ldd and deterministic >> cycle breaking in ld.so, but it's a long way off). >> > > depending on elf ctor ordering sounds broken to me. > (incompatible with static linking) > > depending on deterministic ordering in the presence > of cycles sounds even more nonsensical. > (the dynamic linker shouldn't try to solve np hard > problems.) > > >
On Thu, Apr 28, 2016 at 10:12 AM, Carlos O'Donell <carlos@redhat.com> wrote: > On 04/26/2016 04:58 PM, d wk wrote: >> In a project of mine, I needed to run some code before any constructors >> from any system libraries (such as libc or libpthread). The linker/loader >> -z initfirst feature is perfect for this, but it only supports one shared >> library. Unfortunately libpthread also uses this feature (I assume the >> feature exists because pthread needed it), so my project was incompatible >> with libpthread. >> >> So, I wrote a small patch which changes the single dl_initfirst variable >> into a linked list. This patch does not change the size of any data >> structures (it's ABI compatible), just turns dl_initfirst into a list. The >> list is not freed (the allocator wouldn't free it anyway), and insertion >> into the list is quadratic, but I expect there will never be more than >> a handful of initfirst libraries! >> >> This patch records initfirst libraries in load order, so LD_PRELOAD >> libraries will have their constructors called before libpthread. If the >> opposite behaviour is desired, the LD_PRELOAD'd library can always declare >> a dependency on libpthread. Normally LD_PRELOAD constructors are run last, >> which is very inconvenient when trying to inject new functionality, and >> I expect anyone using -z initfirst with LD_PRELOAD to really want to run >> first. The patch is written against latest glibc 2.23 (I also tested on >> glibc 2.21, and it's not quite compatible with 2.17 since the other data >> structures changed). >> >> I was not the first person to run into this problem, >> someone wanted the same thing on stack overflow two years >> ago. You can see my answer there with a complete test case. >> http://stackoverflow.com/questions/19796383/linux-ld-preload-z-initfirst/36143861 >> >> Hope you will accept this patch. Comments welcome. Thanks, > > (1) High level review: I like the idea. > > At a high level I like the idea. > > We have one open bug for this here: > > Bug 14379 - shared object constructors are called in the wrong order > https://sourceware.org/bugzilla/show_bug.cgi?id=14379 > > It is true that LD_PRELOAD'd libraries are loaded last, but only because > of a quirk in the processing mechanics. The library is first in the scope > search order (so the symbols interpose) which means it's initialized last, > and finalized first after the application. > > To be honest I think this is a bug and LD_PRELOAD'd libraries should be > initialized and finalized based on their dependencies. I would be a little wary of changing existing LD_PRELOAD constructor ordering behaviour. It might break some existing programs (certainly it would break the system I created). The problem is that LD_PRELOAD is often used for debugging, profiling, and overriding things; sometimes this means you need to run very early, and sometimes this means you need to run very late. As it happens, I needed both, hence this initfirst patch. If LD_PRELOAD is changed as you suggest, even if this patch is accepted, there will be no way to "run late" anymore. I agree that the overall initfirst, LD_PRELOAD, etc scheme could use some adjustments; I was proposing this patch as a straightforward fix that could be incorporated without breaking anything (because only libpthread is using this feature right now). It would be helpful for me and potentially other tool builders. I'm happy to help design larger-scale changes too. > > (2) DF_1_INITFIRST > > Implementing DF_1_INITFIRST is going to be complicated. It is complicated > by the fact that dlopen could result in N libraries being loaded, and if > M of those is DF_1_INITFIRST, then M must be initialized first. This > is my interpretation of the Solaris implementation and the reason for > the wording "at the same time." > > I can't look at your patch to review this though (copyright issues). > > (3) Finalization of DF_1_INITFIRST and initfirst libraries. > > Ignore DF_1_INITFIRST. > > Finalization for initfirst libraries has to happen after the > finalization of all other libraries. This makes the behaviour > deterministic and symmetric. Not currently implemented, as I said, and it's orthogonal to the problem of supporting multiple libraries -- but I agree that this should be done. > > (4) Copyright status? > > Please clarify your copyright status with the project: > https://sourceware.org/glibc/wiki/Contribution%20checklist#FSF_copyright_Assignment > > I don't see your name on the copyright assignment documents > from the FSF. > > The easiest form is: > http://git.savannah.gnu.org/cgit/gnulib.git/plain/doc/Copyright/request-assign.future > > It assigns all current and future contributions to glibc to the FSF and > allows us to accept them immediately. I sent the form in, hopefully it'll be processed soon. I am a researcher and there are unlikely to be any issues in this regard. > > Cheers, > Carlos.
On Thu, Apr 28, 2016 at 12:58 PM, Yury Gribov <y.gribov@samsung.com> wrote: > On 04/28/2016 06:54 PM, Szabolcs Nagy wrote: >> >> On 28/04/16 16:27, Carlos O'Donell wrote: >>> >>> On 04/27/2016 01:22 PM, Yury Gribov wrote: >>>> >>>> Yeah. It seems that initfirst is a crude hack which bypasses all >>>> dependency tracking. I wonder if there's a place for another, >>>> hopefully saner, dependency-respecting flag. >>> >>> >>> What use cases do you need to support? >>> >>> How would a dependency-respecting initfirst-like flag work given the >>> conflicting requirements to initialize in dependency order and yet >>> not initialize in dependency order? >>> >> >> may be use a topological sort order where the >> flagged module comes earliest possible? > > > That was my expectation as well. It seems like we have found two use cases for initfirst: 1) Changing the behaviour of libraries loaded later (pthread tells libc it needs to support multithreading before libc is initialized). This is for tightly coupled libraries and dependency order does not matter here assuming you're just setting flags. 2) Getting a very early hook into the program for profiling, etc. Dependency ordering is necessary only to ease the implementation of this hook. Any dependencies are not going to be handled by this hook (i.e. you can't profile libdl if you depend on it). There are other ways of implementing hook dependencies. For example, hook-safe libraries like libdl could check if they have been initialized, and if not, initialize before processing the user's request (dlsym). Or, a constructor could have the option to run other constructors by library name (via ld.so's call_init). I think a tool that needs serious control over the loading process will simply reimplement its dependencies as I did to make sure it really knows what's going on. The ordering at load-time is already complicated enough as it is... Would it be possible to just statically link libdl into the tool? Or design libdl so that it doesn't need initializing? Then anyone can call dlopen, dlsym, and do what they need to do. > > >>> Today we have: >>> >>> - Library initializers and finalizers (Run in dep order) >>> - Library constructors and destructors (Run in dep order) >>> - Prioritized constructors and destructors (Run in dep order, and # >>> order) >>> - LD_PRELOAD initializer run last before the application is initialized. >>> - Non-zero initialized data from .data in the ELF image. >>> >>> Dependency ordering also include symbol dependencies and relocation >>> dependencies, sorting all objects into a linear list and breaking >> >> >> if that was done correctly then i think LD_PRELOAD >> libs would come before the modules using the interposed >> symbols. >> >>> cycles where appropriate at deterministic points (though we need >>> a better ldd to show these problems). >>> >>> It might be better if we had some kind of invariant assertions like >>> "abort if I'm not initialized before library SONAME" that then allowed >>> developers to realize their invariant is wrong and restructure the >>> application. Alternatively if we had some better tooling that might >>> help also (I'm working on an alternate eu-ldd and deterministic >>> cycle breaking in ld.so, but it's a long way off). >>> >> >> depending on elf ctor ordering sounds broken to me. >> (incompatible with static linking) >> >> depending on deterministic ordering in the presence >> of cycles sounds even more nonsensical. >> (the dynamic linker shouldn't try to solve np hard >> problems.) >> >> >> >
On 04/28/2016 11:54 AM, Szabolcs Nagy wrote: > On 28/04/16 16:27, Carlos O'Donell wrote: >> On 04/27/2016 01:22 PM, Yury Gribov wrote: >>> Yeah. It seems that initfirst is a crude hack which bypasses all >>> dependency tracking. I wonder if there's a place for another, >>> hopefully saner, dependency-respecting flag. >> >> What use cases do you need to support? >> >> How would a dependency-respecting initfirst-like flag work given the >> conflicting requirements to initialize in dependency order and yet >> not initialize in dependency order? >> > > may be use a topological sort order where the > flagged module comes earliest possible? That makes sense. >> Today we have: >> >> - Library initializers and finalizers (Run in dep order) >> - Library constructors and destructors (Run in dep order) >> - Prioritized constructors and destructors (Run in dep order, and # order) >> - LD_PRELOAD initializer run last before the application is initialized. >> - Non-zero initialized data from .data in the ELF image. >> >> Dependency ordering also include symbol dependencies and relocation >> dependencies, sorting all objects into a linear list and breaking > > if that was done correctly then i think LD_PRELOAD > libs would come before the modules using the interposed > symbols. I *thought* this was what was implemented by _dl_lookup_symbol_x, add_dependency, and link_map->l_reldeps, but after reviewing the code and writing some test case it doesn't work for all use cases. We have add_dependency which is called from _dl_lookup_symbol_x which adds symbol binding information, but it isn't used properly to sort constructors in the -Wl,-z,now case (you can't do this in lazy mode). For destructors though, in _dl_sort_fini, we do use l_reldeps to move dependencies such that objects which are the target a symbol binding are destructed later (if possible). For dlclose we prevent unloading a shared library that is the target of a bound symbol used by another shared library (see the common problem with STB_GNU_UNIQUE making C++ plugins unloadable). >> cycles where appropriate at deterministic points (though we need >> a better ldd to show these problems). >> >> It might be better if we had some kind of invariant assertions like >> "abort if I'm not initialized before library SONAME" that then allowed >> developers to realize their invariant is wrong and restructure the >> application. Alternatively if we had some better tooling that might >> help also (I'm working on an alternate eu-ldd and deterministic >> cycle breaking in ld.so, but it's a long way off). >> > > depending on elf ctor ordering sounds broken to me. > (incompatible with static linking) What we are talking about today is adding features to work around existing design issues in applications. Properly designed applications don't have these problems because they don't rely on constructor ordering outside of a translation unit. > depending on deterministic ordering in the presence > of cycles sounds even more nonsensical. > (the dynamic linker shouldn't try to solve np hard > problems.) Sorry, that's not what I wanted to imply. All I wanted to say is that cycles should be broken following some fixed set of rules that we can then test ourselves (without giving users any guarantees) with regression tests, simply to validate the dynamic loader is behaving as we expect. I *do* think we need to provide a tool that shows users what the ctor/dtor ordering is going to look like and why, for the given installed loader. This could be used by support to diagnose problems and remind users not to rely on ctor/dtor ordering.
On 04/29/2016 04:11 AM, Kostya Serebryany wrote: >> >> >>>> I'm myself pretty sure that people would generally prefer to avoid >>>> reimplementing parts of Glibc (symbol resolver in particular). Let's see >>>> if >>>> Kostya has something to say. >>>> >>> >>> >>> Hm? To say about what? :) >>> We almost never use asan as a DSO on Linux, so we don't get any problems >>> like this. >>> >> >> Just to clarify: you mean you don't ASan DSO dependencies (libc, >> libpthread, librt, etc.) to be initialized before ASan initialization (i.e. >> __asan_init) runs? In that case the OP's patch would work for ASan DSO. >> >> No, I mean that we don't use asan.so, only asan.a, which calls __asan_init > from the pre-init array. Right but preinit gets called prior to library ctors which means that __asan_init indeed does not depend on libc initializers to be completed. -Y
On 04/26/2016 10:58 PM, d wk wrote: > In a project of mine, I needed to run some code before any constructors > from any system libraries (such as libc or libpthread). The linker/loader > -z initfirst feature is perfect for this, but it only supports one shared > library. Does -z initfirst affect just constructor execution, or symbol binding as well? As initfirst is currently defined for GNU, there can only be one initfirst library. Florian
On 05/02/2016 11:06 AM, Florian Weimer wrote: > On 04/26/2016 10:58 PM, d wk wrote: >> In a project of mine, I needed to run some code before any constructors >> from any system libraries (such as libc or libpthread). The linker/loader >> -z initfirst feature is perfect for this, but it only supports one shared >> library. > > Does -z initfirst affect just constructor execution, or symbol binding > as well? It's only about construction order. > > As initfirst is currently defined for GNU, there can only be one > initfirst library. > > Florian > >
Hi Carlos: On Thu, Apr 28, 2016 at 2:47 PM, d wk <dwksrc@gmail.com> wrote: > On Thu, Apr 28, 2016 at 10:12 AM, Carlos O'Donell <carlos@redhat.com> wrote: >> >> (4) Copyright status? >> >> Please clarify your copyright status with the project: >> https://sourceware.org/glibc/wiki/Contribution%20checklist#FSF_copyright_Assignment >> >> I don't see your name on the copyright assignment documents >> from the FSF. >> >> The easiest form is: >> http://git.savannah.gnu.org/cgit/gnulib.git/plain/doc/Copyright/request-assign.future >> >> It assigns all current and future contributions to glibc to the FSF and >> allows us to accept them immediately. > > I sent the form in, hopefully it'll be processed soon. I am a researcher > and there are unlikely to be any issues in this regard. Just checking to see if you'd had a chance to look at this patch yet. Is there anything else I need to do, administratively speaking, before you can look at it? Thanks, -dwk.
diff --git a/elf/dl-init.c b/elf/dl-init.c index 818c3aa..da59d1f 100644 --- a/elf/dl-init.c +++ b/elf/dl-init.c @@ -84,7 +84,14 @@ _dl_init (struct link_map *main_map, int argc, char **argv, char **env) if (__glibc_unlikely (GL(dl_initfirst) != NULL)) { - call_init (GL(dl_initfirst), argc, argv, env); + struct initfirst_list *initfirst; + for(initfirst = GL(dl_initfirst); initfirst; initfirst = initfirst->next) + { + call_init (initfirst->which, argc, argv, env); + } + + /* We do not try to free this list, as the memory will not be reclaimed + by the allocator unless there were no intervening malloc()'s. */ GL(dl_initfirst) = NULL; } diff --git a/elf/dl-load.c b/elf/dl-load.c index c0d6249..1efabbf 100644 --- a/elf/dl-load.c +++ b/elf/dl-load.c @@ -1388,7 +1388,24 @@ cannot enable executable stack as shared object requires"); /* Remember whether this object must be initialized first. */ if (l->l_flags_1 & DF_1_INITFIRST) - GL(dl_initfirst) = l; + { + struct initfirst_list *new_node = malloc(sizeof(*node)); + struct initfirst_list *it = GL(dl_initfirst); + new_node->which = l; + new_node->next = NULL; + + /* We append to the end of the linked list. Whichever library was loaded + first has higher initfirst priority. This means that LD_PRELOAD + initfirst overrides initfirst in libraries linked normally. */ + if (!it) + GL(dl_initfirst) = new_node; + else + { + while (it->next) + it = it->next; + it->next = new_node; + } + } /* Finally the file information. */ l->l_file_id = id; diff --git a/elf/dl-support.c b/elf/dl-support.c index c30194c..d8b8acc 100644 --- a/elf/dl-support.c +++ b/elf/dl-support.c @@ -147,8 +147,8 @@ struct r_search_path_elem *_dl_all_dirs; /* All directories after startup. */ struct r_search_path_elem *_dl_init_all_dirs; -/* The object to be initialized first. */ -struct link_map *_dl_initfirst; +/* The list of objects to be initialized first. */ +struct initfirst_list *_dl_initfirst; /* Descriptor to write debug messages to. */ int _dl_debug_fd = STDERR_FILENO; diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h index ddec0be..198c089 100644 --- a/sysdeps/generic/ldsodefs.h +++ b/sysdeps/generic/ldsodefs.h @@ -326,8 +326,11 @@ struct rtld_global /* Incremented whenever something may have been added to dl_loaded. */ EXTERN unsigned long long _dl_load_adds; - /* The object to be initialized first. */ - EXTERN struct link_map *_dl_initfirst; + /* The list of objects to be initialized first. */ + EXTERN struct initfirst_list { + struct link_map *which; + struct initfirst_list *next; + } *_dl_initfirst; #if HP_SMALL_TIMING_AVAIL /* Start time on CPU clock. */