diff mbox

[2/2] Fix expansion slowness of PR60291

Message ID alpine.LSU.2.11.1402211120420.27942@zhemvz.fhfr.qr
State New
Headers show

Commit Message

Richard Biener Feb. 21, 2014, 10:25 a.m. UTC
On Fri, 21 Feb 2014, Richard Biener wrote:

> 
> This fixes the slowness of RTL expansion in PR60291 which is caused
> by excessive collisions in mem-attr sharing.  The issue here is
> that sharing attempts happens across functions and we have a _lot_
> of functions in this testcase referencing the same lexically
> equivalent memory, for example MEM[(StgWord *)_5 + -64B].  That
> means those get the same hash value.  But they don't compare
> equal because an SSA name _5 from function A is of course not equal
> to one from function B.
> 
> The following fixes that by not doing mem-attr sharing across functions
> by clearing the mem-attrs hashtable in rest_of_clean_state.
> 
> Another fix may be to do what the comment in iterative_hash_expr
> says for SSA names:
> 
>     case SSA_NAME:
>       /* We can just compare by pointer.  */
>       return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val);
> 
> (probably blame me for changing that to hashing the SSA version).

It was lxo.

> But I'm not sure that doesn't uncover issues with various hashtables and
> walking them, generating code dependent on the order.  It's IMHO just not
> expected that you compare function-local expressions from different
> functions.

Same speedup result from


better than hashing pointers but requring cfun != NULL in this
function isn't good either.

At this point I'm more comfortable with clearing the hashtable
than with changing iterative_hash_expr in any way.  It's also
along the way to get rid of the hash completely.

Oh, btw, the speedup is going from

 expand                  : 481.98 (94%) usr   1.15 (17%) sys 481.94 (93%) 
wall  293891 kB (15%) ggc

to

 expand                  :   2.66 ( 7%) usr   0.13 ( 2%) sys   2.64 ( 6%) 
wall  262544 kB (13%) ggc

at -O0 (less dramatic slowness for -On).

> The other thing would be to discard mem-attr sharing alltogether,
> but that doesn't seem appropriate at this stage (but it would
> also simplify quite some code).  With only one function in RTL
> at a time that shouldn't be too bad (see several suggestions
> along that line, even with statistics).
> 
> Bootstrap / regtest running on x86_64-unknown-linux-gnu, ok for
> trunk and 4.8 branch?
> 
> Thanks,
> Richard.
> 
> 2014-02-21  Richard Biener  <rguenther@suse.de>
> 
> 	PR middle-end/60291
> 	* rtl.h (clear_mem_attrs_htab): Declare.
> 	* emit-rtl.c (clear_mem_attrs_htab): New function.
> 	* final.c (rest_of_clean_state): Call clear_mem_attrs_htab
> 	to avoid sharing mem-attrs between functions.
> 
> Index: gcc/rtl.h
> ===================================================================
> *** gcc/rtl.h	(revision 207960)
> --- gcc/rtl.h	(working copy)
> *************** extern int in_sequence_p (void);
> *** 2546,2551 ****
> --- 2546,2552 ----
>   extern void init_emit (void);
>   extern void init_emit_regs (void);
>   extern void init_emit_once (void);
> + extern void clear_mem_attrs_htab (void);
>   extern void push_topmost_sequence (void);
>   extern void pop_topmost_sequence (void);
>   extern void set_new_first_and_last_insn (rtx, rtx);
> Index: gcc/emit-rtl.c
> ===================================================================
> *** gcc/emit-rtl.c	(revision 207960)
> --- gcc/emit-rtl.c	(working copy)
> *************** init_emit_once (void)
> *** 5913,5918 ****
> --- 5913,5926 ----
>     simple_return_rtx = gen_rtx_fmt_ (SIMPLE_RETURN, VOIDmode);
>     cc0_rtx = gen_rtx_fmt_ (CC0, VOIDmode);
>   }
> + 
> + /* Clear the mem-attrs sharing hash table.  */
> + 
> + void
> + clear_mem_attrs_htab (void)
> + {
> +   htab_empty (mem_attrs_htab);
> + }
>   
>   /* Produce exact duplicate of insn INSN after AFTER.
>      Care updating of libcall regions if present.  */
> Index: gcc/final.c
> ===================================================================
> *** gcc/final.c	(revision 207960)
> --- gcc/final.c	(working copy)
> *************** rest_of_clean_state (void)
> *** 4678,4683 ****
> --- 4678,4686 ----
>   
>     init_recog_no_volatile ();
>   
> +   /* Reset mem-attrs sharing.  */
> +   clear_mem_attrs_htab ();
> + 
>     /* We're done with this function.  Free up memory if we can.  */
>     free_after_parsing (cfun);
>     free_after_compilation (cfun);
>

Comments

Richard Biener Feb. 21, 2014, 11:24 a.m. UTC | #1
On Fri, 21 Feb 2014, Richard Biener wrote:

> On Fri, 21 Feb 2014, Richard Biener wrote:
> 
> > 
> > This fixes the slowness of RTL expansion in PR60291 which is caused
> > by excessive collisions in mem-attr sharing.  The issue here is
> > that sharing attempts happens across functions and we have a _lot_
> > of functions in this testcase referencing the same lexically
> > equivalent memory, for example MEM[(StgWord *)_5 + -64B].  That
> > means those get the same hash value.  But they don't compare
> > equal because an SSA name _5 from function A is of course not equal
> > to one from function B.
> > 
> > The following fixes that by not doing mem-attr sharing across functions
> > by clearing the mem-attrs hashtable in rest_of_clean_state.
> > 
> > Another fix may be to do what the comment in iterative_hash_expr
> > says for SSA names:
> > 
> >     case SSA_NAME:
> >       /* We can just compare by pointer.  */
> >       return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val);
> > 
> > (probably blame me for changing that to hashing the SSA version).
> 
> It was lxo.
> 
> > But I'm not sure that doesn't uncover issues with various hashtables and
> > walking them, generating code dependent on the order.  It's IMHO just not
> > expected that you compare function-local expressions from different
> > functions.
> 
> Same speedup result from
> 
> Index: gcc/tree.c
> ===================================================================
> --- gcc/tree.c  (revision 207960)
> +++ gcc/tree.c  (working copy)
> @@ -7428,7 +7428,7 @@ iterative_hash_expr (const_tree t, hashv
>        }
>      case SSA_NAME:
>        /* We can just compare by pointer.  */
> -      return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val);
> +      return iterative_hash_hashval_t ((uintptr_t)t>>3, val);
>      case PLACEHOLDER_EXPR:
>        /* The node itself doesn't matter.  */
>        return val;
> 
> and from
> 
> Index: gcc/tree.c
> ===================================================================
> --- gcc/tree.c  (revision 207960)
> +++ gcc/tree.c  (working copy)
> @@ -7428,7 +7428,9 @@ iterative_hash_expr (const_tree t, hashv
>        }
>      case SSA_NAME:
>        /* We can just compare by pointer.  */
> -      return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val);
> +      return iterative_hash_host_wide_int
> +             (DECL_UID (cfun->decl),
> +              iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val));
>      case PLACEHOLDER_EXPR:
>        /* The node itself doesn't matter.  */
>        return val;
> 
> better than hashing pointers but requring cfun != NULL in this
> function isn't good either.
> 
> At this point I'm more comfortable with clearing the hashtable
> than with changing iterative_hash_expr in any way.  It's also
> along the way to get rid of the hash completely.
> 
> Oh, btw, the speedup is going from
> 
>  expand                  : 481.98 (94%) usr   1.15 (17%) sys 481.94 (93%) 
> wall  293891 kB (15%) ggc
> 
> to
> 
>  expand                  :   2.66 ( 7%) usr   0.13 ( 2%) sys   2.64 ( 6%) 
> wall  262544 kB (13%) ggc
> 
> at -O0 (less dramatic slowness for -On).
> 
> > The other thing would be to discard mem-attr sharing alltogether,
> > but that doesn't seem appropriate at this stage (but it would
> > also simplify quite some code).  With only one function in RTL
> > at a time that shouldn't be too bad (see several suggestions
> > along that line, even with statistics).

Last statistics: http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01784.html

Richard.
diff mbox

Patch

Index: gcc/tree.c
===================================================================
--- gcc/tree.c  (revision 207960)
+++ gcc/tree.c  (working copy)
@@ -7428,7 +7428,7 @@  iterative_hash_expr (const_tree t, hashv
       }
     case SSA_NAME:
       /* We can just compare by pointer.  */
-      return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val);
+      return iterative_hash_hashval_t ((uintptr_t)t>>3, val);
     case PLACEHOLDER_EXPR:
       /* The node itself doesn't matter.  */
       return val;

and from

Index: gcc/tree.c
===================================================================
--- gcc/tree.c  (revision 207960)
+++ gcc/tree.c  (working copy)
@@ -7428,7 +7428,9 @@  iterative_hash_expr (const_tree t, hashv
       }
     case SSA_NAME:
       /* We can just compare by pointer.  */
-      return iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val);
+      return iterative_hash_host_wide_int
+             (DECL_UID (cfun->decl),
+              iterative_hash_host_wide_int (SSA_NAME_VERSION (t), val));
     case PLACEHOLDER_EXPR:
       /* The node itself doesn't matter.  */
       return val;