From patchwork Sat Jul 20 17:51:13 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chung-Lin Tang X-Patchwork-Id: 1134421 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=sourceware.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=libc-alpha-return-103845-incoming=patchwork.ozlabs.org@sourceware.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=mentor.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.b="psNNx2Kn"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 45rb5v3bDJz9s7T for ; Sun, 21 Jul 2019 03:51:31 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:reply-to:from:subject:to:message-id:date :mime-version:content-type; q=dns; s=default; b=loxq1rDYea8paCym yx3jL2WU+I+1TtbPD31vrLE1FtL4n7QtmajT5Gpwb1g+o0WuwnI6POJ9aaFiX/Cp Wt8YHO8XqD2lRROIYyMwY4y+JWSmVPo1xIQsUuhX0PRuB6eMKKDYtLtRqoMkxgsi q4vKEUz6ezIpezNBZp+YR+7lFCI= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:reply-to:from:subject:to:message-id:date :mime-version:content-type; s=default; bh=3OjW0UkYwFfBbp7u5FtkQx 2P7CU=; b=psNNx2Knb9/aXH45km9I89VdyOOM7YCmVOMvsD+aEv+DBJW0xPi2or 0BUizld1WBo8w71jpBo/6qpBcj5RMm3D8x1zaYM5pO7kfNw+b72jRJNCkoco9g+c jq6f0mHujHBwGHLrkxGZn378XbIg9HiBWTgHeDa9llljbM1gLmu7k= Received: (qmail 39869 invoked by alias); 20 Jul 2019 17:51:25 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 39861 invoked by uid 89); 20 Jul 2019 17:51:25 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-15.3 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.1 spammy=collecting, incredible, lowest, harder X-HELO: relay1.mentorg.com Reply-To: From: Chung-Lin Tang Subject: [PATCH 2/2][RFC] #17645, fix slow DSO sorting behavior in dynamic loader To: GNU C Library Message-ID: <6c860783-2b6c-d309-3fd1-77867731fbba@mentor.com> Date: Sun, 21 Jul 2019 01:51:13 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 This part is the actual code changes. While the past attempts appeared to be either (1) more sophisticated graph algorithms, with attempts to add Tarjan SCC, or (2) modifying of heuristics to augment the old algorithm to behave more reasonably, here I have basically adhered to the KISS principle. The main algorithm here is simply depth-first search (DFS) to obtain the Reverse-Post Order (RPO) sequence, a topological sort. A new l_visited:1 bitfield is added to struct link_map to more elegantly facilitate such a search. I also have experimented with doing an iterative version, but it is obviously harder to understand, and actually slower when measured by hp-timing.h facilities. I have chosen to use simple recursive DFS, for clarity and performance (Some measures were however taken to curb recursion depth) The DFS algorithm is applied to the input maps[nmap-1] backwards towards maps[0]. This has the effect of a more "shallow" recursion depth in general since the input is in BFS. Also, when combined with the natural order of processing l_initfini[] at each node, this creates a resulting output sorting closer to the intuitive "left-to-right" order in most cases. Per-the discussion in #15311 about relocation dependencies overriding link dependencies, similar to comments #7,#9 there, a second pass of link-dependency-only sorting has been added in that case. Detection of existence of reldeps is done during the first DFS traversal pass, to cull away unneeded cases of this 2nd sorting pass. This also allows discarding of the simple limited cycle detection (i.e. X has reldep on Y, Y links to X) in the current algorithm. A testcase expressing the #15311 case has also been added. On the further general issue of circular dependencies causing SCCs across shared objects, the ELF specification explicitly states that behavior in this case is undefined, although I have found at least one reference describing Solaris' behavior here as basically retaining the received original ordering of those objects [1]. While quite well defined, I'm a little unsure this is the reasonable behavior, as this seems to imply a single circular dependency link will nullify correct topological dependency behavior for the majority of nodes within that SCC. [1] https://docs.oracle.com/cd/E19957-01/806-0641/6j9vuquip/index.html (section "Initialization and Termination Routines") The Tarjan SCC algorthm has been mentioned multiple times in these related BZ issues. It could be said that the Tarjan algorithm is a generalization of post-order DFS traversal; some might say that's an understatement, but the phases of the node visiting and processing really look analogous. It would be possible to extend and implement it mostly within the confines of the code of my patch, but considering the undefined status in the spec, arguably some ambiguities of proper reasonable behavior, and the much more bookkeeping in a piece of code that will be repeatedly executed an incredible number of times across all systems, of which only applies to quite rare cases, I have refrained from adding that kind of processing in this patch, though such issues may be revisited later. Other two notable implementation adjustments related to this _dl_sort_maps() change are: (1) The additional pass of sorting in dl_open_worker() right before relocation has been removed. _dl_map_object_deps() already does a pass of sorting, and simply collecting objects by that order is adequate. Sorting again in that place before relocation appears to be redundant. (2) The use of two char arrays 'used' and 'done' in _dl_close_worker to represent two per-map attributes has been changed to simply use the two bits in the 'l_reserved' field in struct link_map to implement. This also allows discarding the clunky 'used' array sorting that _dl_sort_maps had to (sometimes) do along the way. This patch has been tested on x86_64-linux, powerpc64le-linux, aarch64-linux without regressions (includes the new tests I've added). Requesting for comments and discussion, and of course later, approval to apply to master at appropriate stage. Thanks, Chung-Lin 2019-07-20 Chung-Lin Tang [BZ #17645] [BZ #15311] [BZ #15310] * elf/dl-close.c (MAP_DONE,MAP_USED,SET_MAP_DONE,SET_MAP_USED): New helper macros. (_dl_close_worker): Remove used[], done[] char arrays, use above macros, add l_reserved init/fini code, adjust call to _dl_sort_maps. * elf/dl-deps.c (_dl_map_object_deps): Adjust call to _dl_sort_maps. * elf/dl-fini.c (_dl_fini): Likewise. * elf/dl-open.c (dl_open_worker): Remove call to _dl_sort_maps, adjust surrounding code and comments. * elf/dl-sort_maps.c (dfs_traversal): New static function. (_dl_sort_maps): New Reverse-Postorder (RPO) based implementation. * include/link.h (struct link_map): Add l_visited:1 bitfield. * sysdeps/generic/ldsodefs.h (_dl_sort_maps): Adjust declaration. diff --git a/elf/dl-close.c b/elf/dl-close.c index 4aef95a1a0..2b24d63422 100644 --- a/elf/dl-close.c +++ b/elf/dl-close.c @@ -43,6 +43,11 @@ typedef void (*fini_t) (void); /* Special l_idx value used to indicate which objects remain loaded. */ #define IDX_STILL_USED -1 +/* We use the two l_reserved bits for 'used' and 'done' attributes. */ +#define MAP_DONE(L) ((L)->l_reserved & 1) +#define MAP_USED(L) ((L)->l_reserved & 2) +#define SET_MAP_DONE(L) ((L)->l_reserved |= 1) +#define SET_MAP_USED(L) ((L)->l_reserved |= 2) /* Returns true we an non-empty was found. */ static bool @@ -140,8 +145,6 @@ _dl_close_worker (struct link_map *map, bool force) bool any_tls = false; const unsigned int nloaded = ns->_ns_nloaded; - char used[nloaded]; - char done[nloaded]; struct link_map *maps[nloaded]; /* Clear DF_1_NODELETE to force object deletion. We don't need to touch @@ -157,24 +160,20 @@ _dl_close_worker (struct link_map *map, bool force) int idx = 0; for (struct link_map *l = ns->_ns_loaded; l != NULL; l = l->l_next) { + l->l_reserved = 0; l->l_idx = idx; maps[idx] = l; ++idx; - } assert (idx == nloaded); - /* Prepare the bitmaps. */ - memset (used, '\0', sizeof (used)); - memset (done, '\0', sizeof (done)); - /* Keep track of the lowest index link map we have covered already. */ int done_index = -1; while (++done_index < nloaded) { struct link_map *l = maps[done_index]; - if (done[done_index]) + if (MAP_DONE (l)) /* Already handled. */ continue; @@ -185,12 +184,12 @@ _dl_close_worker (struct link_map *map, bool force) /* See CONCURRENCY NOTES in cxa_thread_atexit_impl.c to know why acquire is sufficient and correct. */ && atomic_load_acquire (&l->l_tls_dtor_count) == 0 - && !used[done_index]) + && !MAP_USED (l)) continue; /* We need this object and we handle it now. */ - done[done_index] = 1; - used[done_index] = 1; + SET_MAP_DONE (l); + SET_MAP_USED (l); /* Signal the object is still needed. */ l->l_idx = IDX_STILL_USED; @@ -206,9 +205,9 @@ _dl_close_worker (struct link_map *map, bool force) { assert ((*lp)->l_idx >= 0 && (*lp)->l_idx < nloaded); - if (!used[(*lp)->l_idx]) + if (!MAP_USED (*lp)) { - used[(*lp)->l_idx] = 1; + SET_MAP_USED (*lp); /* If we marked a new object as used, and we've already processed it, then we need to go back and process again from that point forward to @@ -231,9 +230,9 @@ _dl_close_worker (struct link_map *map, bool force) { assert (jmap->l_idx >= 0 && jmap->l_idx < nloaded); - if (!used[jmap->l_idx]) + if (!MAP_USED (jmap)) { - used[jmap->l_idx] = 1; + SET_MAP_USED (jmap); if (jmap->l_idx - 1 < done_index) done_index = jmap->l_idx - 1; } @@ -241,10 +240,8 @@ _dl_close_worker (struct link_map *map, bool force) } } - /* Sort the entries. We can skip looking for the binary itself which is - at the front of the search list for the main namespace. */ - _dl_sort_maps (maps + (nsid == LM_ID_BASE), nloaded - (nsid == LM_ID_BASE), - used + (nsid == LM_ID_BASE), true); + /* Sort the entries. */ + _dl_sort_maps (maps, nloaded, true); /* Call all termination functions at once. */ #ifdef SHARED @@ -261,7 +258,7 @@ _dl_close_worker (struct link_map *map, bool force) /* All elements must be in the same namespace. */ assert (imap->l_ns == nsid); - if (!used[i]) + if (!MAP_USED (imap)) { assert (imap->l_type == lt_loaded && (imap->l_flags_1 & DF_1_NODELETE) == 0); @@ -323,7 +320,7 @@ _dl_close_worker (struct link_map *map, bool force) if (i < first_loaded) first_loaded = i; } - /* Else used[i]. */ + /* Else MAP_USED (imap). */ else if (imap->l_type == lt_loaded) { struct r_scope_elem *new_list = NULL; @@ -544,7 +541,7 @@ _dl_close_worker (struct link_map *map, bool force) for (unsigned int i = first_loaded; i < nloaded; ++i) { struct link_map *imap = maps[i]; - if (!used[i]) + if (!MAP_USED (imap)) { assert (imap->l_type == lt_loaded); @@ -798,6 +795,10 @@ _dl_close_worker (struct link_map *map, bool force) if (dl_close_state == rerun) goto retry; + /* Reset l_reserved bits to zero. */ + for (struct link_map *l = ns->_ns_loaded; l != NULL; l = l->l_next) + l->l_reserved = 0; + dl_close_state = not_pending; } diff --git a/elf/dl-deps.c b/elf/dl-deps.c index e12c353158..5698552f05 100644 --- a/elf/dl-deps.c +++ b/elf/dl-deps.c @@ -589,9 +589,7 @@ Filters not supported with LD_TRACE_PRELINKING")); itself will always be initialize last. */ memcpy (l_initfini, map->l_searchlist.r_list, nlist * sizeof (struct link_map *)); - /* We can skip looking for the binary itself which is at the front of - the search list. */ - _dl_sort_maps (&l_initfini[1], nlist - 1, NULL, false); + _dl_sort_maps (l_initfini, nlist, false); /* Terminate the list of dependencies. */ l_initfini[nlist] = NULL; diff --git a/elf/dl-fini.c b/elf/dl-fini.c index 1e55d39814..91ee57a68a 100644 --- a/elf/dl-fini.c +++ b/elf/dl-fini.c @@ -88,11 +88,8 @@ _dl_fini (void) assert (ns == LM_ID_BASE || i == nloaded || i == nloaded - 1); unsigned int nmaps = i; - /* Now we have to do the sorting. We can skip looking for the - binary itself which is at the front of the search list for - the main namespace. */ - _dl_sort_maps (maps + (ns == LM_ID_BASE), nmaps - (ns == LM_ID_BASE), - NULL, true); + /* Now we have to do the sorting. */ + _dl_sort_maps (maps, nmaps, true); /* We do not rely on the linked list of loaded object anymore from this point on. We have our own list here (maps). The diff --git a/elf/dl-open.c b/elf/dl-open.c index e18ee398cb..0a448832ac 100644 --- a/elf/dl-open.c +++ b/elf/dl-open.c @@ -301,8 +301,9 @@ dl_open_worker (void *a) if (GLRO(dl_lazy)) reloc_mode |= mode & RTLD_LAZY; - /* Sort the objects by dependency for the relocation process. This - allows IFUNC relocations to work and it also means copy + /* Start relocation process of newly loaded objects, which were sorted by + dependency during _dl_map_object(), results placed in new->l_initfini. + This allows IFUNC relocations to work and it also means copy relocation of dependencies are if necessary overwritten. */ unsigned int nmaps = 0; struct link_map *l = new; @@ -314,16 +315,14 @@ dl_open_worker (void *a) } while (l != NULL); struct link_map *maps[nmaps]; - nmaps = 0; - l = new; - do + int i = 0; + for (struct link_map **ptr = new->l_initfini; *ptr; ptr++) { + l = *ptr; if (! l->l_real->l_relocated) - maps[nmaps++] = l; - l = l->l_next; + maps[i++] = l; } - while (l != NULL); - _dl_sort_maps (maps, nmaps, NULL, false); + assert (i == nmaps); int relocation_in_progress = 0; diff --git a/elf/dl-sort-maps.c b/elf/dl-sort-maps.c index 26b3fd93a3..0c359d3da9 100644 --- a/elf/dl-sort-maps.c +++ b/elf/dl-sort-maps.c @@ -16,107 +16,131 @@ License along with the GNU C Library; if not, see . */ +#include #include +/* We use a recursive function due to its better clarity and ease of + implementation, as well as faster execution speed. We already use + alloca() for list allocation during the breadth-first search of + dependencies in _dl_map_object_deps(), and this should be on the + same order of worst-case stack usage. */ -/* Sort array MAPS according to dependencies of the contained objects. - Array USED, if non-NULL, is permutated along MAPS. If FOR_FINI this is - called for finishing an object. */ -void -_dl_sort_maps (struct link_map **maps, unsigned int nmaps, char *used, - bool for_fini) +static void +dfs_traversal (struct link_map ***rpo, struct link_map *map, + bool *do_reldeps) { - /* A list of one element need not be sorted. */ - if (nmaps <= 1) + if (map->l_visited) return; - unsigned int i = 0; - uint16_t seen[nmaps]; - memset (seen, 0, nmaps * sizeof (seen[0])); - while (1) - { - /* Keep track of which object we looked at this round. */ - ++seen[i]; - struct link_map *thisp = maps[i]; + map->l_visited = 1; - if (__glibc_unlikely (for_fini)) + if (map->l_initfini) + { + for (int i = 0; map->l_initfini[i] != NULL; i++) { - /* Do not handle ld.so in secondary namespaces and objects which - are not removed. */ - if (thisp != thisp->l_real || thisp->l_idx == -1) - goto skip; + struct link_map *dep = map->l_initfini[i]; + if (dep->l_visited == 0) + dfs_traversal (rpo, dep, do_reldeps); } + } + + if (__glibc_unlikely (do_reldeps != NULL && map->l_reldeps != NULL)) + { + /* Indicate that we encountered relocation dependencies during + traversal. */ + *do_reldeps = true; - /* Find the last object in the list for which the current one is - a dependency and move the current object behind the object - with the dependency. */ - unsigned int k = nmaps - 1; - while (k > i) + for (int m = map->l_reldeps->act - 1; m >= 0; m--) { - struct link_map **runp = maps[k]->l_initfini; - if (runp != NULL) - /* Look through the dependencies of the object. */ - while (*runp != NULL) - if (__glibc_unlikely (*runp++ == thisp)) - { - move: - /* Move the current object to the back past the last - object with it as the dependency. */ - memmove (&maps[i], &maps[i + 1], - (k - i) * sizeof (maps[0])); - maps[k] = thisp; - - if (used != NULL) - { - char here_used = used[i]; - memmove (&used[i], &used[i + 1], - (k - i) * sizeof (used[0])); - used[k] = here_used; - } - - if (seen[i + 1] > nmaps - i) - { - ++i; - goto next_clear; - } - - uint16_t this_seen = seen[i]; - memmove (&seen[i], &seen[i + 1], (k - i) * sizeof (seen[0])); - seen[k] = this_seen; - - goto next; - } - - if (__glibc_unlikely (for_fini && maps[k]->l_reldeps != NULL)) - { - unsigned int m = maps[k]->l_reldeps->act; - struct link_map **relmaps = &maps[k]->l_reldeps->list[0]; - - /* Look through the relocation dependencies of the object. */ - while (m-- > 0) - if (__glibc_unlikely (relmaps[m] == thisp)) - { - /* If a cycle exists with a link time dependency, - preserve the latter. */ - struct link_map **runp = thisp->l_initfini; - if (runp != NULL) - while (*runp != NULL) - if (__glibc_unlikely (*runp++ == maps[k])) - goto ignore; - goto move; - } - ignore:; - } - - --k; + struct link_map *dep = map->l_reldeps->list[m]; + if (dep->l_visited == 0) + dfs_traversal (rpo, dep, do_reldeps); } + } + + *rpo -= 1; + **rpo = map; +} - skip: - if (++i == nmaps) - break; - next_clear: - memset (&seen[i], 0, (nmaps - i) * sizeof (seen[0])); +/* Topologically sort array MAPS according to dependencies of the contained + objects. */ - next:; +void +_dl_sort_maps (struct link_map **maps, unsigned int nmaps, bool for_fini) +{ + for (int i = 0; i < nmaps; i++) + maps[i]->l_visited = 0; + + /* We apply DFS traversal for each of maps[i] until the whole total order + is found and we're at the start of the Reverse-Postorder (RPO) sequence, + which is a topological sort. + + We go from maps[nmaps - 1] backwards towards maps[0] at this level. + Due to the breadth-first search (BFS) ordering we receive, going + backwards usually gives a more shallow depth-first recursion depth, + adding more stack usage safety. Also, combined with the natural + processing order of l_initfini[] at each node during DFS, this maintains + an ordering closer to the original link ordering in the sorting results + under most simpler cases. + + Another reason we order the top level backwards, it that maps[0] is + usually exactly the main object of which we're in the midst of + _dl_map_object_deps() processing, and maps[0]->l_initfini[] is still + blank. If we start the traversal from maps[0], since having no + dependencies yet filled in, maps[0] will always be immediately + incorrectly placed at the last place in the order (first in reverse). + Adjusting the order so that maps[0] is last traversed naturally avoids + this problem. + + Further, the old "optimization" of skipping the main object at maps[0] + from the call-site (i.e. _dl_sort_maps(maps+1,nmaps-1)) is in general + no longer valid, since traversing along object dependency-links + may "find" the main object even when it is not included in the initial + order (e.g. a dlopen()'ed shared object can have circular dependencies + linked back to itself). In such a case, traversing N-1 objects will + create a N-object result, and raise problems. + + To summarize, just passing in the full list, and iterating from back + to front makes things much more straightforward. */ + + struct link_map *rpo[nmaps]; + struct link_map **rpo_head = &rpo[nmaps]; + + bool do_reldeps = false; + bool *do_reldeps_ref = (for_fini ? &do_reldeps : NULL); + + for (int i = nmaps - 1; i >= 0; i--) + { + dfs_traversal (&rpo_head, maps[i], do_reldeps_ref); + + /* We can break early if all objects are already placed. */ + if (rpo_head == rpo) + goto end; + } + assert (rpo_head == rpo); + + end: + /* This is avoided if !FOR_FINI or if we didn't find any reldeps in + the first DFS traversal. */ + if (do_reldeps) + { + for (int i = 0; i < nmaps; i++) + rpo[i]->l_visited = 0; + + struct link_map **maps_head = &maps[nmaps]; + for (int i = nmaps - 1; i >= 0; i--) + { + dfs_traversal (&maps_head, rpo[i], NULL); + + /* We can break early if all objects are already placed. + The below memcpy is not needed in the do_reldeps case here, + since we wrote back to maps[] during DFS traversal. */ + if (maps_head == maps) + return; + } + assert (maps_head == maps); + return; } + + memcpy (maps, rpo, sizeof (struct link_map *) * nmaps); } diff --git a/include/link.h b/include/link.h index 736e1d72ae..67ff00c4bc 100644 --- a/include/link.h +++ b/include/link.h @@ -178,6 +178,8 @@ struct link_map unsigned int l_init_called:1; /* Nonzero if DT_INIT function called. */ unsigned int l_global:1; /* Nonzero if object in _dl_global_scope. */ unsigned int l_reserved:2; /* Reserved for internal use. */ + unsigned int l_visited:1; /* Used internally for map dependency + graph traversal. */ unsigned int l_phdr_allocated:1; /* Nonzero if the data structure pointed to by `l_phdr' is allocated. */ unsigned int l_soname_added:1; /* Nonzero if the SONAME is for sure in diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h index b1fc5c31f9..9b6dcc8d5f 100644 --- a/sysdeps/generic/ldsodefs.h +++ b/sysdeps/generic/ldsodefs.h @@ -964,8 +964,8 @@ extern void _dl_init (struct link_map *main_map, int argc, char **argv, extern void _dl_fini (void) attribute_hidden; /* Sort array MAPS according to dependencies of the contained objects. */ -extern void _dl_sort_maps (struct link_map **maps, unsigned int nmaps, - char *used, bool for_fini) attribute_hidden; +extern void _dl_sort_maps (struct link_map **, unsigned int, bool) + attribute_hidden; /* The dynamic linker calls this function before and having changing any shared object mappings. The `r_state' member of `struct r_debug'