Message ID | alpine.LSU.2.11.1402211302010.27942@zhemvz.fhfr.qr |
---|---|
State | New |
Headers | show |
Richard Biener <rguenther@suse.de> writes: > On Fri, 21 Feb 2014, Richard Biener wrote: > >> On Fri, 21 Feb 2014, Richard Biener wrote: >> >> > On Fri, 21 Feb 2014, Richard Biener wrote: >> > >> > > >> > > This fixes the slowness of RTL expansion in PR60291 which is caused >> > > by excessive collisions in mem-attr sharing. The issue here is >> > > that sharing attempts happens across functions and we have a _lot_ >> > > of functions in this testcase referencing the same lexically >> > > equivalent memory, for example MEM[(StgWord *)_5 + -64B]. That >> > > means those get the same hash value. But they don't compare >> > > equal because an SSA name _5 from function A is of course not equal >> > > to one from function B. >> > > >> > > The following fixes that by not doing mem-attr sharing across functions >> > > by clearing the mem-attrs hashtable in rest_of_clean_state. >> > > >> > > Another fix may be to do what the comment in iterative_hash_expr >> > > says for SSA names: >> > > >> > > case SSA_NAME: >> > > /* We can just compare by pointer. */ >> > > return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val); >> > > >> > > (probably blame me for changing that to hashing the SSA version). >> > >> > It was lxo. >> > >> > > But I'm not sure that doesn't uncover issues with various hashtables and >> > > walking them, generating code dependent on the order. It's IMHO just not >> > > expected that you compare function-local expressions from different >> > > functions. >> > >> > Same speedup result from >> > >> > Index: gcc/tree.c >> > =================================================================== >> > --- gcc/tree.c (revision 207960) >> > +++ gcc/tree.c (working copy) >> > @@ -7428,7 +7428,7 @@ iterative_hash_expr (const_tree t, hashv >> > } >> > case SSA_NAME: >> > /* We can just compare by pointer. */ >> > - return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val); >> > + return iterative_hash_hashval_t ((uintptr_t)t>>3, val); >> > case PLACEHOLDER_EXPR: >> > /* The node itself doesn't matter. */ >> > return val; >> > >> > and from >> > >> > Index: gcc/tree.c >> > =================================================================== >> > --- gcc/tree.c (revision 207960) >> > +++ gcc/tree.c (working copy) >> > @@ -7428,7 +7428,9 @@ iterative_hash_expr (const_tree t, hashv >> > } >> > case SSA_NAME: >> > /* We can just compare by pointer. */ >> > - return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val); >> > + return iterative_hash_host_wide_int >> > + (DECL_UID (cfun->decl), >> > + iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val)); >> > case PLACEHOLDER_EXPR: >> > /* The node itself doesn't matter. */ >> > return val; >> > >> > better than hashing pointers but requring cfun != NULL in this >> > function isn't good either. >> > >> > At this point I'm more comfortable with clearing the hashtable >> > than with changing iterative_hash_expr in any way. It's also >> > along the way to get rid of the hash completely. >> > >> > Oh, btw, the speedup is going from >> > >> > expand : 481.98 (94%) usr 1.15 (17%) sys 481.94 (93%) >> > wall 293891 kB (15%) ggc >> > >> > to >> > >> > expand : 2.66 ( 7%) usr 0.13 ( 2%) sys 2.64 ( 6%) >> > wall 262544 kB (13%) ggc >> > >> > at -O0 (less dramatic slowness for -On). >> > >> > > The other thing would be to discard mem-attr sharing alltogether, >> > > but that doesn't seem appropriate at this stage (but it would >> > > also simplify quite some code). With only one function in RTL >> > > at a time that shouldn't be too bad (see several suggestions >> > > along that line, even with statistics). >> >> Last statistics: http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01784.html > > With the patch below to get some statistics we see that one important > piece of sharing not covered by above measurements is RTX copying(?). > > On the testcase for this PR I get at -O1 and without the patch > to clear the hashtable after each function > > 142489 mem_attrs created (142439 for new, 50 for modification) > 1983225 mem_attrs shared (4044 for new, 820241 for modification, 1158940 > by rtx copying) > 0 mem_attrs dropped > > and with the patch to clear after each function > > 364411 mem_attrs created (144478 for new, 219933 for modification) > 1761303 mem_attrs shared (2005 for new, 600358 for modification, 1158940 > by rtx copying) > 0 mem_attrs dropped > > while for dwarf2out.c I see without the clearing > > 24399 mem_attrs created (6929 for new, 17470 for modification) > 102676 mem_attrs shared (10878 for new, 29265 for modification, 62533 by > rtx copying) > 16 mem_attrs dropped > > which means that completely dropping the sharing would result > in creating of 6929 + 17807 + 62533(!) vs. 24399 mem-attrs. > That's still not a lot overhead given that mem-attrs take 40 bytes > (3MB vs. 950kB). There is also always the possibility to > explicitely ref-count mem-attrs to handle sharing by rtx > copying (at least cse, fwprop, combine, ira and reload copy MEMs, > probably some for no good reason because MEMs are not shared), > thus make mem-attrs unshare-on-modify. In a thread a few years ago you talked about the possibility of going further and folding the attributes into the MEM itself, so avoiding the indirection and separate allocation: http://thread.gmane.org/gmane.comp.gcc.patches/244464/focus=244538 (and earlier posts in the thread). Would that still be OK? I might have a go if so. Thanks, Richard
Sir, I have a cross compiler and I know how to cross compile a file . But I am doing all just for glib compilation that I do not know how to do. Anyone to guide me. or generally just inform me how can I compile a complete library using gcc. -- View this message in context: http://gcc.1065356.n5.nabble.com/PATCH-2-2-Fix-expansion-slowness-of-PR60291-tp1013329p1013362.html Sent from the gcc - patches mailing list archive at Nabble.com.
Index: gcc/rtl.c =================================================================== --- gcc/rtl.c (revision 207938) +++ gcc/rtl.c (working copy) @@ -326,6 +326,8 @@ copy_rtx (rtx orig) return copy; } +unsigned long mem_attrs_shared_copy; + /* Create a new copy of an rtx. Only copy just one level. */ rtx @@ -333,6 +335,8 @@ shallow_copy_rtx_stat (const_rtx orig ME { const unsigned int size = rtx_size (orig); rtx const copy = ggc_alloc_rtx_def_stat (size PASS_MEM_STAT); + if (MEM_P (orig) && MEM_ATTRS (orig)) + mem_attrs_shared_copy++; return (rtx) memcpy (copy, orig, size); } Index: gcc/emit-rtl.c =================================================================== --- gcc/emit-rtl.c (revision 207938) +++ gcc/emit-rtl.c (working copy) @@ -290,6 +290,12 @@ mem_attrs_htab_eq (const void *x, const return mem_attrs_eq_p ((const mem_attrs *) x, (const mem_attrs *) y); } +unsigned long mem_attrs_dropped; +unsigned long mem_attrs_new; +unsigned long mem_attrs_new_modified; +unsigned long mem_attrs_shared; +unsigned long mem_attrs_shared_modified; + /* Set MEM's memory attributes so that they are the same as ATTRS. */ static void @@ -300,6 +306,8 @@ set_mem_attrs (rtx mem, mem_attrs *attrs /* If everything is the default, we can just clear the attributes. */ if (mem_attrs_eq_p (attrs, mode_mem_attrs[(int) GET_MODE (mem)])) { + if (MEM_ATTRS (mem)) + mem_attrs_dropped++; MEM_ATTRS (mem) = 0; return; } @@ -309,6 +317,17 @@ set_mem_attrs (rtx mem, mem_attrs *attrs { *slot = ggc_alloc_mem_attrs (); memcpy (*slot, attrs, sizeof (mem_attrs)); + if (MEM_ATTRS (mem)) + mem_attrs_new_modified++; + else + mem_attrs_new++; + } + else + { + if (MEM_ATTRS (mem)) + mem_attrs_shared_modified++; + else + mem_attrs_shared++; } MEM_ATTRS (mem) = (mem_attrs *) *slot; Index: gcc/toplev.c =================================================================== --- gcc/toplev.c (revision 207938) +++ gcc/toplev.c (working copy) @@ -1989,6 +2023,26 @@ toplev_main (int argc, char **argv) if (!exit_after_options) do_compile (); + { + extern unsigned long mem_attrs_dropped; + extern unsigned long mem_attrs_new; + extern unsigned long mem_attrs_new_modified; + extern unsigned long mem_attrs_shared; + extern unsigned long mem_attrs_shared_modified; + extern unsigned long mem_attrs_shared_copy; + fprintf (stderr, "%lu mem_attrs created (%lu for new, %lu for " + "modification)\n", + mem_attrs_new + mem_attrs_new_modified, + mem_attrs_new, mem_attrs_new_modified); + fprintf (stderr, "%lu mem_attrs shared (%lu for new, %lu for " + "modification, %lu by rtx copying)\n", + mem_attrs_shared + mem_attrs_shared_modified + + mem_attrs_shared_copy, + mem_attrs_shared, mem_attrs_shared_modified, + mem_attrs_shared_copy); + fprintf (stderr, "%lu mem_attrs dropped\n", mem_attrs_dropped); + } + if (warningcount || errorcount || werrorcount) print_ignored_options ();