Message ID | alpine.LSU.2.20.1806051149210.5043@zhemvz.fhfr.qr |
---|---|
State | New |
Headers | show |
Series | Avoid excessive location expansion in assign_discriminators | expand |
> On testcases like that from PR60243 CFG build is dominated by > assign_discriminators because it expands locations again and again > and this got more expensive over the time. > > Cary - can you explain the overall logic of assign_discriminators, > specifically why if the last stmt of a block has UNKNOWN_LOCATION > we don't need any discriminator? That last stmt inherits the last > location from a previous stmt with location. Also why > do we look at e->dest _last_ stmt in addition to the first stmt? Sorry, it's been a long time since I looked at this. I think that test for UNKNOWN_LOCATION is just a way of punting on an unexpected condition; the rest of the function won't work unless we have a valid locus to start with. > If I understand correctly the assignment is done at CFG construction > time rather than location emission time because we want it done > on a representation close to source? So if the last stmt has > the same line then the first should have as well? As for why we look at last_stmt as well as first_stmt, that has to do with the way for loops are expanded. See my explanation here: https://gcc.gnu.org/ml/gcc-patches/2009-07/msg01450.html > Or, to ask a different question - why is this done so early on > a non-optimized CFG rather than late on RTL right before we > emit .loc directives? It has to be done early because we need to have discriminators assigned for the tree_profile pass, in order to use profile feedback from an earlier run. > It would be nice if you could expand the comment before > assign_discriminators in some way addressing the above. > > The patch below cuts the time spent in assign_discriminators > down by a factor of 2.5. There's obvious cleanup possibilities > for the pointer hash-table given we should rather embed the > int, int pair rather than allocating it on the heap. Couldn't > figure out the appropriate hash-traits quickly enough though. This looks good, except won't the hash table now compare equal for two different locations where the line is the same, but the file is not? In the Google branches, we improved discriminator assignment quite a bit by tracking per instruction instead of per basic block. Here's my original patch to do that: https://gcc.gnu.org/ml/gcc-patches/2009-11/msg00563.html Your improvement to hoist the expansion of locus_e would still be useful there. Unfortunately, I never had the chance to update that patch to preserve the discriminator info across LTO streaming, which is why it remained only in the Google branches. There were a few follow-on patches; the last backport to the google/gcc-4_9 branch combined most of them: https://gcc.gnu.org/ml/gcc-patches/2014-05/msg00869.html and I think there was one more discriminator patch after that: https://gcc.gnu.org/ml/gcc-patches/2014-08/msg02671.html -cary
On Tue, 19 Jun 2018, Cary Coutant wrote: > > On testcases like that from PR60243 CFG build is dominated by > > assign_discriminators because it expands locations again and again > > and this got more expensive over the time. > > > > Cary - can you explain the overall logic of assign_discriminators, > > specifically why if the last stmt of a block has UNKNOWN_LOCATION > > we don't need any discriminator? That last stmt inherits the last > > location from a previous stmt with location. Also why > > do we look at e->dest _last_ stmt in addition to the first stmt? > > Sorry, it's been a long time since I looked at this. I think that test > for UNKNOWN_LOCATION is just a way of punting on an unexpected > condition; the rest of the function won't work unless we have a valid > locus to start with. > > > If I understand correctly the assignment is done at CFG construction > > time rather than location emission time because we want it done > > on a representation close to source? So if the last stmt has > > the same line then the first should have as well? > > As for why we look at last_stmt as well as first_stmt, that has to do > with the way for loops are expanded. See my explanation here: > > https://gcc.gnu.org/ml/gcc-patches/2009-07/msg01450.html Ah, OK. I guess together with the fact that we're working on an unoptimized CFG that makes sense. It basically assumes we have no line back-and-forth jumping at this stage which I'm not 100% sure is a valid assumption. Likewise we shouldn't have UNKNOWN_LOCATION stmts at this point (but again I'm quite sure there are a few cases where we do...). > > Or, to ask a different question - why is this done so early on > > a non-optimized CFG rather than late on RTL right before we > > emit .loc directives? > > It has to be done early because we need to have discriminators > assigned for the tree_profile pass, in order to use profile feedback > from an earlier run. OK. > > It would be nice if you could expand the comment before > > assign_discriminators in some way addressing the above. > > > > The patch below cuts the time spent in assign_discriminators > > down by a factor of 2.5. There's obvious cleanup possibilities > > for the pointer hash-table given we should rather embed the > > int, int pair rather than allocating it on the heap. Couldn't > > figure out the appropriate hash-traits quickly enough though. > > This looks good, except won't the hash table now compare equal for two > different locations where the line is the same, but the file is not? This is what the previous state did as well, it just compared (and hashed) LOCATION_LINE: inline hashval_t locus_discrim_hasher::hash (const locus_discrim_map *item) { - return LOCATION_LINE (item->locus); + return item->location_line; } locus_discrim_hasher::equal (const locus_discrim_map *a, const locus_discrim_map *b) { - return LOCATION_LINE (a->locus) == LOCATION_LINE (b->locus); + return a->location_line == b->location_line; } maybe that wasn't intended (or it was changed in previous revs from sth more sensible). > In the Google branches, we improved discriminator assignment quite a > bit by tracking per instruction instead of per basic block. Here's my > original patch to do that: > > https://gcc.gnu.org/ml/gcc-patches/2009-11/msg00563.html That somehow suggests assigning discriminators after optimizing would help. I don't see discriminator being used in the profile code in the FSF tree btw., just some comments in auto-profile.c that it could be used. So I guess the issue is that you need to match the CFG at profile reading time with the line-numbers associated to the final assembly for sample-based profiling. That indeed sounds like a hard^Wimpossible task. And indeed sth on a basic-block level isn't going to help very much. At least it suggests that we want to assign discriminators at the same point we'll later read the sample-based profile rather than at CFG construction. There are quite a number of optimizations run, including inlining, before pass_ipa_auto_profile. Btw, does this mean that discriminators (and the compile-time used to compute them) are not necessary if sample-based profiling isn't used? Thanks, Richard.
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c index 4a1b2bef570..21b3fdffa59 100644 --- a/gcc/tree-cfg.c +++ b/gcc/tree-cfg.c @@ -110,7 +110,7 @@ struct replace_decls_d /* Hash table to store last discriminator assigned for each locus. */ struct locus_discrim_map { - location_t locus; + int location_line; int discriminator; }; @@ -129,7 +129,7 @@ struct locus_discrim_hasher : free_ptr_hash <locus_discrim_map> inline hashval_t locus_discrim_hasher::hash (const locus_discrim_map *item) { - return LOCATION_LINE (item->locus); + return item->location_line; } /* Equality function for the locus-to-discriminator map. A and B @@ -139,7 +139,7 @@ inline bool locus_discrim_hasher::equal (const locus_discrim_map *a, const locus_discrim_map *b) { - return LOCATION_LINE (a->locus) == LOCATION_LINE (b->locus); + return a->location_line == b->location_line; } static hash_table<locus_discrim_hasher> *discriminator_per_locus; @@ -1168,21 +1168,20 @@ gimple_find_sub_bbs (gimple_seq seq, gimple_stmt_iterator *gsi) profiling. */ static int -next_discriminator_for_locus (location_t locus) +next_discriminator_for_locus (int line) { struct locus_discrim_map item; struct locus_discrim_map **slot; - item.locus = locus; + item.location_line = line; item.discriminator = 0; - slot = discriminator_per_locus->find_slot_with_hash ( - &item, LOCATION_LINE (locus), INSERT); + slot = discriminator_per_locus->find_slot_with_hash (&item, line, INSERT); gcc_assert (slot); if (*slot == HTAB_EMPTY_ENTRY) { *slot = XNEW (struct locus_discrim_map); gcc_assert (*slot); - (*slot)->locus = locus; + (*slot)->location_line = line; (*slot)->discriminator = 0; } (*slot)->discriminator++; @@ -1192,23 +1191,22 @@ next_discriminator_for_locus (location_t locus) /* Return TRUE if LOCUS1 and LOCUS2 refer to the same source line. */ static bool -same_line_p (location_t locus1, location_t locus2) +same_line_p (location_t locus1, expanded_location *from, location_t locus2) { - expanded_location from, to; + expanded_location to; if (locus1 == locus2) return true; - from = expand_location (locus1); to = expand_location (locus2); - if (from.line != to.line) + if (from->line != to.line) return false; - if (from.file == to.file) + if (from->file == to.file) return true; - return (from.file != NULL + return (from->file != NULL && to.file != NULL - && filename_cmp (from.file, to.file) == 0); + && filename_cmp (from->file, to.file) == 0); } /* Assign discriminators to each basic block. */ @@ -1228,17 +1226,23 @@ assign_discriminators (void) if (locus == UNKNOWN_LOCATION) continue; + expanded_location locus_e = expand_location (locus); + FOR_EACH_EDGE (e, ei, bb->succs) { gimple *first = first_non_label_stmt (e->dest); gimple *last = last_stmt (e->dest); - if ((first && same_line_p (locus, gimple_location (first))) - || (last && same_line_p (locus, gimple_location (last)))) + if ((first && same_line_p (locus, &locus_e, + gimple_location (first))) + || (last && same_line_p (locus, &locus_e, + gimple_location (last)))) { if (e->dest->discriminator != 0 && bb->discriminator == 0) - bb->discriminator = next_discriminator_for_locus (locus); + bb->discriminator + = next_discriminator_for_locus (locus_e.line); else - e->dest->discriminator = next_discriminator_for_locus (locus); + e->dest->discriminator + = next_discriminator_for_locus (locus_e.line); } } }