Message ID | ormw7ol1sf.fsf@free.home |
---|---|
State | New |
Headers | show |
Use __glibc_{,un}likely rather than __builtin_expect. Can you confirm that this does not change the set of states that might be seen by td_thr_tlsbase? Thanks, Roland
On Nov 18, 2014, Roland McGrath <roland@hack.frob.com> wrote: > Use __glibc_{,un}likely rather than __builtin_expect. *nod*, I've now changed the code I had reindented. > Can you confirm that this does not change the set of states that might be > seen by td_thr_tlsbase? No, but I can confirm that, after this change, td_thr_tlsbase may return as much garbage for Static TLS modules as the current code may for dynamic TLS modules, since it doesn't check generation counts. It would be possible to change it so that it compares the generation count of the module and that of the DTV, so as to avoid returning garbage; it could even compute the address for Static TLS modules, so that we kept on returning the same pointer, regardless of what is actually in the DTV. Should it?
> No, but I can confirm that, after this change, td_thr_tlsbase may return > as much garbage for Static TLS modules as the current code may for > dynamic TLS modules, since it doesn't check generation counts. That sounds like you're saying your change causes a regression, which incidentally has the same failure mode as an existing bug for a different circumstance. The existing bug existing is reason to fix it, and indeed the existing bug is not the fault of your change. But that bug existing is not an excuse to cause a regression in a related case. > It would be possible to change it so that it compares the generation > count of the module and that of the DTV, so as to avoid returning > garbage; it could even compute the address for Static TLS modules, so > that we kept on returning the same pointer, regardless of what is > actually in the DTV. Should it? td_the_tlsbase should return success and yield the correct pointer in any circumstance where the TLS block for the module and thread requested has already been initialized. It should fail with TD_TLSDEFER only when the thread could not possibly have observed any values in that TLS block. That way, the debugger can fall back to showing initial values from the PT_TLS segment (and refusing attempts to mutate) for the TD_TLSDEFER case, and never fail to make the values the program will actually see available to the user of the debugger. I gather from what you've said that it does not already meet that standard in all cases. We should fix that, but do it in a way that does not leave any intermediate state that is a regression from the status quo ante. Thanks, Roland
On Nov 18, 2014, Roland McGrath <roland@hack.frob.com> wrote: >> No, but I can confirm that, after this change, td_thr_tlsbase may return >> as much garbage for Static TLS modules as the current code may for >> dynamic TLS modules, since it doesn't check generation counts. > That sounds like you're saying your change causes a regression, which > incidentally has the same failure mode as an existing bug for a > different circumstance. The lack of docs prevents from from concluding it's a regression, rather than a change in behavior within the intended boundaries, and the lack of special-casing for Static TLS in the implementation doesn't enable me to conclude it's a different circumstance, but I'd trade run-time TLS failures, memory corruption and memory leaks for more frequent garbage out of nptl_db any time. Anyway, if we had to choose between the two, I'd easily go with the patch that makes this tradeoff. (Not that I had *meant* the patch as a trade-off, mind you. I didn't realize nptl_db could be affected by this patch, so I only looked into it after you pointed it out. So I thank you for catching this, regardless of whether we agree on whether the change in behavior is a regression or a legitimate change within the (un?)defined boundaries of behavior for that function.) Fortunately, we don't have to make that choice; we can have the fix for the TLS bugs, and we can have reliable TLS accessors in nptl_db. There is a caveats, however: we need some means to find a module's DTV generation and Static TLS offset, given a modid. The offset can be obtained from the link_map, whereas the generation and the link_map pointer can be obtained from the dtv_slotinfo_list. However, we have no immediate means to get ot either of these data structures. The main difficulty I'm yet to address (the rest is already done; I'll gladly share the WIP patch if anyone's interested) is that _dl_tls_dtv_slotinfo_list is defined under different names for static and dynamic programs, a stand-alone symbol for the former and a _rtld_global member in the latter, and we can't assume from say dynamic nptl whether we're gonna be dlopened by a static libdl, or by the dynamic ld.so, and a libpthread.a might be linked into a dynamic executable, or even into a dlopened library. I'm not sure which, if any, of these cases we regard as unsupported, but I see difficulties in handling them all. Plus, nptl_db seems to be only equipped to look up symbols in libpthread proper, and nothing in there links back to _rtld_global or its corresponding standalone symbols. I see a number of possibilities to overcome this: - add a symbol to libpthread that, during early initialization, is set to points to the head of the dtv slotinfo list. I'm not sure yet how it would be initialized though to work under all the situations above. - add to link_maps a pointer to the corresponding slotinfo entry, so that we can get ahold of the generation starting from the link_map. This would enable at least td_thr_tls_get_addr to DTRT; td_thr_tlsbase, taking only modids, would require some means to map those back to link_maps to DTRT. - iterate over the link_map list to locate a module with the looked-for modid. Horribly inefficient, but lacking other means, this could be a viable last resort. We'd still need gen counts added to link_maps though. - special-case the _dl_tls_dtv_slotinfo_list lookup so that we can find it both as a member of _rtld_global, defined in ld.so, and as a stand-alone symbol defined in the main static executable. (Can ps_pglobal_lookup be generally used to look up a symbol globally, if given a NULL object_name or somesuch? I can't find documentation for this interface, and we only use it to look up symbols in libpthread_so. Plus, what if we can't get to the slotinfo_list, say, because the static executable is stripped and it hasn't exported this dynamic symbol we need?) - other possibilities I haven't considered? Any recommendations? > td_the_tlsbase should return success and yield the correct pointer in > any circumstance where the TLS block for the module and thread requested > has already been initialized. It should fail with TD_TLSDEFER only when > the thread could not possibly have observed any values in that TLS > block. That way, the debugger can fall back to showing initial values > from the PT_TLS segment (and refusing attempts to mutate) for the > TD_TLSDEFER case, and never fail to make the values the program will > actually see available to the user of the debugger. This sounds like a very reasonable goal. Is this documented anywhere? If not, may I paste your paragraph as a comment next to the function definition? Or to its declaration? Thanks,
> - add a symbol to libpthread that, during early initialization, is set > to points to the head of the dtv slotinfo list. I'm not sure yet how it > would be initialized though to work under all the situations above. This would not be too bad. But it's of course better to have zero runtime overhead if it's feasible, which it seems like it is. > - special-case the _dl_tls_dtv_slotinfo_list lookup so that we can find > it both as a member of _rtld_global, defined in ld.so, and as a > stand-alone symbol defined in the main static executable. I wouldn't really call this a special case. It's easy enough to extend the DB_SYMBOL macro, db-symbols.h, and td_symbol_list.c to distinguish the object name for different symbols. Then _rtld_global is a symbol like any other (but for LD_SO instead of LIBPTHREAD_SO). The td_thr_tlsbase code can just look for _rtld_global first and if it doesn't find that, look for _dl_tls_dtv_slotinfo_list instead. This seems like the right thing to do. > (Can > ps_pglobal_lookup be generally used to look up a symbol globally, if > given a NULL object_name or somesuch? I can't find documentation for > this interface, and we only use it to look up symbols in libpthread_so. It's possible there's some old Solaris documentation to be found for the details of the proc_service.h interface, since they invented it. But all that really matters is how GDB has used it for glibc systems, which might not actually adhere to any to what any such Solaris documentation prescribes. In actual fact, GDB ignores the argument entirely and just does a vanilla symbol lookup like you'd get if you said "p &symbol". That's why it works at all now for statically-linked libpthread, where the main executable is where the symbols are actually found. That said, since the interface has the parameter, I'd still be in favor of passing in the correct object name just pro forma (since it's really not hard to implement). > Plus, what if we can't get to the slotinfo_list, say, because the static > executable is stripped and it hasn't exported this dynamic symbol we > need?) Then td_ta_new will have failed to find the nptl_version symbol and no use of libthread_db will ever succeed, so you won't be there to wonder. (It needs some local symbols like this one, so a stripped executable with libpthread statically linked never works, even if it is a dynamic executable that exports every global in libpthread.a.) > > td_the_tlsbase should return success and yield the correct pointer in > > any circumstance where the TLS block for the module and thread requested > > has already been initialized. It should fail with TD_TLSDEFER only when > > the thread could not possibly have observed any values in that TLS > > block. That way, the debugger can fall back to showing initial values > > from the PT_TLS segment (and refusing attempts to mutate) for the > > TD_TLSDEFER case, and never fail to make the values the program will > > actually see available to the user of the debugger. > > This sounds like a very reasonable goal. Is this documented anywhere? > If not, may I paste your paragraph as a comment next to the function > definition? Or to its declaration? The Solaris documentation specifies td_thr_tlsbase quite simply, "... returns the base address of a threads local storage block..." The quality of implementation target that I stated just seems obvious. Use my text however you like. Thanks, Roland
On 11/20/2014 02:17 AM, Roland McGrath wrote: > In actual fact, GDB ignores the argument entirely and just does a vanilla > symbol lookup like you'd get if you said "p &symbol". That's why it works > at all now for statically-linked libpthread, where the main executable is > where the symbols are actually found. Indeed. I once tried to change GDB to actually use the passed in argument, but found that broke the statically-linked libpthread case. > That said, since the interface has > the parameter, I'd still be in favor of passing in the correct object name > just pro forma (since it's really not hard to implement). Note that's really not sufficient for getting the statically-linked case fully right. Some of the libpthread symbols thread_db wants to looks up, although static, have names that aren't in the implementation namespace. As such, if the statically-linked program happens to have other symbols with the same name, GDB might well find and return those to nptl_db instead. For example, it should be perfectly legitimate for a program to define a symbol called "stack_used", but that ends up confusing nptl_db/gdb, because that's one of the static libpthread symbols nptl_db looks up: /* List of the stacks in use. */ static LIST_HEAD (stack_used); And that's exactly what happened in: [Bug 9635 - Cannot find new threads: generic error] https://sourceware.org/bugzilla/show_bug.cgi?id=9635 Actually looks like stack_used may be the only one with the problem: $ grep -rn DB_GET_SYMBOL -rn nptl_db/td_ta_thr_iter.c:157: err = DB_GET_SYMBOL (list, ta, __stack_user); nptl_db/td_ta_thr_iter.c:164: err = DB_GET_SYMBOL (list, ta, stack_used); nptl_db/td_ta_tsd_iter.c:53: err = DB_GET_SYMBOL (addr, ta, __pthread_keys); nptl_db/td_ta_event_addr.c:40: err = DB_GET_SYMBOL (taddr, ta, __nptl_create_event); nptl_db/td_ta_event_addr.c:44: err = DB_GET_SYMBOL (taddr, ta, __nptl_death_event); nptl_db/td_thr_validate.c:65: err = DB_GET_SYMBOL (list, th->th_ta_p, __stack_user); nptl_db/td_thr_validate.c:73: err = DB_GET_SYMBOL (list, th->th_ta_p, stack_used); nptl_db/td_ta_map_lwp2thr.c:190: td_err_e err = DB_GET_SYMBOL (list, ta, __stack_user); nptl_db/td_ta_set_event.c:41: err = DB_GET_SYMBOL (eventmask, ta, __nptl_threads_events); nptl_db/thread_dbP.h:150:#define DB_GET_SYMBOL(var, ta, name) \ nptl_db/td_thr_event_getmsg.c:71: err = DB_GET_SYMBOL (prevp, th->th_ta_p, __nptl_last_event); nptl_db/td_ta_clear_event.c:41: err = DB_GET_SYMBOL (eventmask, ta, __nptl_threads_events); Though renaming __stack_user to __nptl_stack_user too might be good, in case other parts of the implementation (e.g., libgcc) might end up defining their own unrelated static symbols with that name too. Thanks, Pedro Alves
On 11/20/2014 10:28 AM, Pedro Alves wrote: > On 11/20/2014 02:17 AM, Roland McGrath wrote: >> In actual fact, GDB ignores the argument entirely and just does a vanilla >> symbol lookup like you'd get if you said "p &symbol". That's why it works >> at all now for statically-linked libpthread, where the main executable is >> where the symbols are actually found. > > Indeed. I once tried to change GDB to actually use the passed in argument, > but found that broke the statically-linked libpthread case. > >> That said, since the interface has >> the parameter, I'd still be in favor of passing in the correct object name >> just pro forma (since it's really not hard to implement). > > Note that's really not sufficient for getting the statically-linked case > fully right. Some of the libpthread symbols thread_db wants to looks up, > although static, have names that aren't in the implementation namespace. As > such, if the statically-linked program happens to have other symbols with the > same name, GDB might well find and return those to nptl_db instead. FYI, filed PR17629 for this now. https://sourceware.org/bugzilla/show_bug.cgi?id=17629 Thanks, Pedro Alves > > For example, it should be perfectly legitimate for a program to define a > symbol called "stack_used", but that ends up confusing nptl_db/gdb, > because that's one of the static libpthread symbols nptl_db looks up: > > /* List of the stacks in use. */ > static LIST_HEAD (stack_used); > > And that's exactly what happened in: > > [Bug 9635 - Cannot find new threads: generic error] > https://sourceware.org/bugzilla/show_bug.cgi?id=9635 > > Actually looks like stack_used may be the only one with the problem: > > $ grep -rn DB_GET_SYMBOL -rn > nptl_db/td_ta_thr_iter.c:157: err = DB_GET_SYMBOL (list, ta, __stack_user); > nptl_db/td_ta_thr_iter.c:164: err = DB_GET_SYMBOL (list, ta, stack_used); > nptl_db/td_ta_tsd_iter.c:53: err = DB_GET_SYMBOL (addr, ta, __pthread_keys); > nptl_db/td_ta_event_addr.c:40: err = DB_GET_SYMBOL (taddr, ta, __nptl_create_event); > nptl_db/td_ta_event_addr.c:44: err = DB_GET_SYMBOL (taddr, ta, __nptl_death_event); > nptl_db/td_thr_validate.c:65: err = DB_GET_SYMBOL (list, th->th_ta_p, __stack_user); > nptl_db/td_thr_validate.c:73: err = DB_GET_SYMBOL (list, th->th_ta_p, stack_used); > nptl_db/td_ta_map_lwp2thr.c:190: td_err_e err = DB_GET_SYMBOL (list, ta, __stack_user); > nptl_db/td_ta_set_event.c:41: err = DB_GET_SYMBOL (eventmask, ta, __nptl_threads_events); > nptl_db/thread_dbP.h:150:#define DB_GET_SYMBOL(var, ta, name) \ > nptl_db/td_thr_event_getmsg.c:71: err = DB_GET_SYMBOL (prevp, th->th_ta_p, __nptl_last_event); > nptl_db/td_ta_clear_event.c:41: err = DB_GET_SYMBOL (eventmask, ta, __nptl_threads_events); > > Though renaming __stack_user to __nptl_stack_user too might be good, in case > other parts of the implementation (e.g., libgcc) might end up defining their > own unrelated static symbols with that name too. > > Thanks, > Pedro Alves >
diff --git a/NEWS b/NEWS index b152488..4f208a2 100644 --- a/NEWS +++ b/NEWS @@ -9,10 +9,10 @@ Version 2.21 * The following bugs are resolved with this release: - 6652, 12926, 14132, 14138, 14171, 15215, 15884, 17266, 17344, 17363, - 17370, 17371, 17411, 17460, 17475, 17485, 17501, 17506, 17508, 17522, - 17555, 17570, 17571, 17572, 17573, 17574, 17582, 17583, 17584, 17585, - 17589, 17594, 17616. + 6652, 12926, 14132, 14138, 14171, 15215, 15884, 17090, 17266, 17344, + 17363, 17370, 17371, 17411, 17460, 17475, 17485, 17501, 17506, 17508, + 17522, 17555, 17570, 17571, 17572, 17573, 17574, 17582, 17583, 17584, + 17585, 17589, 17594, 17616, 17620, 17621. * The minimum GCC version that can be used to build this version of the GNU C Library is GCC 4.6. Older GCC versions, and non-GNU compilers, can diff --git a/elf/dl-open.c b/elf/dl-open.c index 7cc4cc1..9d6006b 100644 --- a/elf/dl-open.c +++ b/elf/dl-open.c @@ -531,17 +531,7 @@ TLS generation counter wrapped! Please report this.")); && imap->l_tls_blocksize > 0) { /* For static TLS we have to allocate the memory here and - now. This includes allocating memory in the DTV. But we - cannot change any DTV other than our own. So, if we - cannot guarantee that there is room in the DTV we don't - even try it and fail the load. - - XXX We could track the minimum DTV slots allocated in - all threads. */ - if (! RTLD_SINGLE_THREAD_P && imap->l_tls_modid > DTV_SURPLUS) - _dl_signal_error (0, "dlopen", NULL, N_("\ -cannot load any more object with static TLS")); - + now, but we can delay updating the DTV. */ imap->l_need_tls_init = 0; #ifdef SHARED /* Update the slot information data for at least the diff --git a/elf/dl-reloc.c b/elf/dl-reloc.c index 97a7119..1d66f79 100644 --- a/elf/dl-reloc.c +++ b/elf/dl-reloc.c @@ -136,12 +136,6 @@ _dl_nothread_init_static_tls (struct link_map *map) # error "Either TLS_TCB_AT_TP or TLS_DTV_AT_TP must be defined" #endif - /* Fill in the DTV slot so that a later LD/GD access will find it. */ - dtv_t *dtv = THREAD_DTV (); - assert (map->l_tls_modid <= dtv[-1].counter); - dtv[map->l_tls_modid].pointer.val = dest; - dtv[map->l_tls_modid].pointer.is_static = true; - /* Initialize the memory. */ memset (__mempcpy (dest, map->l_tls_initimage, map->l_tls_initimage_size), '\0', map->l_tls_blocksize - map->l_tls_initimage_size); diff --git a/elf/dl-tls.c b/elf/dl-tls.c index 5204fda..2408384 100644 --- a/elf/dl-tls.c +++ b/elf/dl-tls.c @@ -437,17 +437,14 @@ _dl_allocate_tls_init (void *result) assert (listp->slotinfo[cnt].gen <= GL(dl_tls_generation)); maxgen = MAX (maxgen, listp->slotinfo[cnt].gen); + dtv[map->l_tls_modid].pointer.val = TLS_DTV_UNALLOCATED; + dtv[map->l_tls_modid].pointer.is_static = false; + if (map->l_tls_offset == NO_TLS_OFFSET || map->l_tls_offset == FORCED_DYNAMIC_TLS_OFFSET) - { - /* For dynamically loaded modules we simply store - the value indicating deferred allocation. */ - dtv[map->l_tls_modid].pointer.val = TLS_DTV_UNALLOCATED; - dtv[map->l_tls_modid].pointer.is_static = false; - continue; - } + continue; - assert (map->l_tls_modid == cnt); + assert (map->l_tls_modid == total + cnt); assert (map->l_tls_blocksize >= map->l_tls_initimage_size); #if TLS_TCB_AT_TP assert ((size_t) map->l_tls_offset >= map->l_tls_blocksize); @@ -459,8 +456,6 @@ _dl_allocate_tls_init (void *result) #endif /* Copy the initialization image and clear the BSS part. */ - dtv[map->l_tls_modid].pointer.val = dest; - dtv[map->l_tls_modid].pointer.is_static = true; memset (__mempcpy (dest, map->l_tls_initimage, map->l_tls_initimage_size), '\0', map->l_tls_blocksize - map->l_tls_initimage_size); @@ -632,10 +627,9 @@ _dl_update_slotinfo (unsigned long int req_modid) might still be allocated. */ if (! dtv[total + cnt].pointer.is_static && dtv[total + cnt].pointer.val != TLS_DTV_UNALLOCATED) - { - free (dtv[total + cnt].pointer.val); - dtv[total + cnt].pointer.val = TLS_DTV_UNALLOCATED; - } + free (dtv[total + cnt].pointer.val); + dtv[total + cnt].pointer.val = TLS_DTV_UNALLOCATED; + dtv[total + cnt].pointer.is_static = false; continue; } @@ -698,10 +692,8 @@ _dl_update_slotinfo (unsigned long int req_modid) memalign and not malloc. */ free (dtv[modid].pointer.val); - /* This module is loaded dynamically- We defer memory - allocation. */ - dtv[modid].pointer.is_static = false; dtv[modid].pointer.val = TLS_DTV_UNALLOCATED; + dtv[modid].pointer.is_static = false; if (modid == req_modid) the_map = map; @@ -739,7 +731,6 @@ tls_get_addr_tail (GET_ADDR_ARGS, dtv_t *dtv, struct link_map *the_map) the_map = listp->slotinfo[idx].map; } - again: /* Make sure that, if a dlopen running in parallel forces the variable into static storage, we'll wait until the address in the static TLS block is set up, and use that. If we're undecided @@ -753,22 +744,28 @@ tls_get_addr_tail (GET_ADDR_ARGS, dtv_t *dtv, struct link_map *the_map) the_map->l_tls_offset = FORCED_DYNAMIC_TLS_OFFSET; __rtld_lock_unlock_recursive (GL(dl_load_lock)); } - else + else if (__builtin_expect (the_map->l_tls_offset + != FORCED_DYNAMIC_TLS_OFFSET, 1)) { +#if TLS_TCB_AT_TP + void *p = (char *) THREAD_SELF - the_map->l_tls_offset; +#elif TLS_DTV_AT_TP + void *p = (char *) THREAD_SELF + the_map->l_tls_offset + TLS_PRE_TCB_SIZE; +#else +# error "Either TLS_TCB_AT_TP or TLS_DTV_AT_TP must be defined" +#endif __rtld_lock_unlock_recursive (GL(dl_load_lock)); - if (__builtin_expect (the_map->l_tls_offset - != FORCED_DYNAMIC_TLS_OFFSET, 1)) - { - void *p = dtv[GET_ADDR_MODULE].pointer.val; - if (__glibc_unlikely (p == TLS_DTV_UNALLOCATED)) - goto again; - return (char *) p + GET_ADDR_OFFSET; - } + dtv[GET_ADDR_MODULE].pointer.is_static = true; + dtv[GET_ADDR_MODULE].pointer.val = p; + + return (char *) p + GET_ADDR_OFFSET; } + else + __rtld_lock_unlock_recursive (GL(dl_load_lock)); } void *p = dtv[GET_ADDR_MODULE].pointer.val = allocate_and_init (the_map); - dtv[GET_ADDR_MODULE].pointer.is_static = false; + assert (!dtv[GET_ADDR_MODULE].pointer.is_static); return (char *) p + GET_ADDR_OFFSET; } diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c index 8cf0274..8b053c1 100644 --- a/nptl/allocatestack.c +++ b/nptl/allocatestack.c @@ -1190,7 +1190,6 @@ __nptl_setxid (struct xid_command *cmdp) static inline void __attribute__((always_inline)) init_one_static_tls (struct pthread *curp, struct link_map *map) { - dtv_t *dtv = GET_DTV (TLS_TPADJ (curp)); # if TLS_TCB_AT_TP void *dest = (char *) curp - map->l_tls_offset; # elif TLS_DTV_AT_TP @@ -1199,11 +1198,9 @@ init_one_static_tls (struct pthread *curp, struct link_map *map) # error "Either TLS_TCB_AT_TP or TLS_DTV_AT_TP must be defined" # endif - /* Fill in the DTV slot so that a later LD/GD access will find it. */ - dtv[map->l_tls_modid].pointer.val = dest; - dtv[map->l_tls_modid].pointer.is_static = true; - - /* Initialize the memory. */ + /* We cannot delay the initialization of the Static TLS area, since + it can be accessed with LE or IE, but since the DTV is only used + by GD and LD, we can delay its update to avoid a race. */ memset (__mempcpy (dest, map->l_tls_initimage, map->l_tls_initimage_size), '\0', map->l_tls_blocksize - map->l_tls_initimage_size); }