Patchwork mem_attrs_htab

login
register
mail settings
Submitter Dimitrios Apostolou
Date Aug. 22, 2011, 7:32 a.m.
Message ID <alpine.LNX.2.02.1108221027320.1374@localhost.localdomain>
Download mbox | patch
Permalink /patch/110860/
State New
Headers show

Comments

Dimitrios Apostolou - Aug. 22, 2011, 7:32 a.m.
2011-08-22  Dimitrios Apostolou  <jimis@gmx.net>

 	* emit-rtl.c (mem_attrs_htab_hash): Hash massively by calling
 	iterative_hash(). We disregard the offset,size rtx fields of the
 	mem_attrs struct, but overall this hash is a *huge* improvement to
 	the previous one, it reduces the collisions/searches ratio from 8
 	to 0.8 for some cases.
 	(init_emit_once): Slightly increase the mem_attrs_htab initial
 	size because it's frequently used and expanded many times.
Jakub Jelinek - Aug. 22, 2011, 7:49 a.m.
On Mon, Aug 22, 2011 at 10:32:35AM +0300, Dimitrios Apostolou wrote:
> --- gcc/emit-rtl.c	2011-05-29 17:40:05 +0000
> +++ gcc/emit-rtl.c	2011-08-21 04:44:25 +0000
> @@ -256,11 +256,10 @@ mem_attrs_htab_hash (const void *x)
>  {
>    const mem_attrs *const p = (const mem_attrs *) x;
>  
> -  return (p->alias ^ (p->align * 1000)
> -	  ^ (p->addrspace * 4000)
> -	  ^ ((p->offset ? INTVAL (p->offset) : 0) * 50000)
> -	  ^ ((p->size ? INTVAL (p->size) : 0) * 2500000)
> -	  ^ (size_t) iterative_hash_expr (p->expr, 0));
> +  /* By massively feeding the mem_attrs struct to iterative_hash() we
> +     disregard the p->offset and p->size rtx, but in total the hash is
> +     quick and good enough. */
> +  return iterative_hash_object (*p, iterative_hash_expr (p->expr, 0));
>  }
>  
>  /* Returns nonzero if the value represented by X (which is really a

This patch isn't against the trunk, where p->offset and p->size aren't rtxes
anymore, but HOST_WIDE_INTs.  Furthermore, it is a bad idea to hash
the p->expr address itself, it doesn't make any sense to hash on what
p->expr points to in that case.  And p->offset and p->size should be ignored
if the *known_p corresponding fields are false.  So, if you really think
using iterative_hash_object is a win, it should be something like:
  mem_attrs q = *p;
  q.expr = NULL;
  if (!q.offset_known_p) q.offset = 0;
  if (!q.size_known_p) q.size = 0;
  return iterative_hash_object (q, iterative_hash_expr (p->expr, 0));
(or better yet avoid q.expr = NULL and instead start hashing from the next
field after expr).  Hashing the struct padding might not be a good idea
either.

	Jakub
Dimitrios Apostolou - Aug. 22, 2011, 7:58 a.m.
Hi Jakub,

I forgot to mention that all patches are against mid-July trunk, I was 
hoping I'd have no conflicts. Anyway thanks for letting me know, 
if there are conflicts with my other patches please let me know, and I'll 
post an updated version at a later date.

All your other concerns are valid and I'll try addressing them in the 
future. I didn't like hashing addresses either, and I was surprised I saw 
no regressions.


Dimitris


>
> This patch isn't against the trunk, where p->offset and p->size aren't rtxes
> anymore, but HOST_WIDE_INTs.  Furthermore, it is a bad idea to hash
> the p->expr address itself, it doesn't make any sense to hash on what
> p->expr points to in that case.  And p->offset and p->size should be ignored
> if the *known_p corresponding fields are false.  So, if you really think
> using iterative_hash_object is a win, it should be something like:
>  mem_attrs q = *p;
>  q.expr = NULL;
>  if (!q.offset_known_p) q.offset = 0;
>  if (!q.size_known_p) q.size = 0;
>  return iterative_hash_object (q, iterative_hash_expr (p->expr, 0));
> (or better yet avoid q.expr = NULL and instead start hashing from the next
> field after expr).  Hashing the struct padding might not be a good idea
> either.
>
> 	Jakub
>
Jakub Jelinek - Aug. 22, 2011, 8:04 a.m.
On Mon, Aug 22, 2011 at 10:58:48AM +0300, Dimitrios Apostolou wrote:
> the future. I didn't like hashing addresses either, and I was
> surprised I saw no regressions.

Hashing on the expr address as well just results in smaller sharing
in the hash table (i.e. if the expr has different address, but is considered
equal).  The hashing of mem attrs is done just to reduce memory overhead.

	Jakub
Richard Guenther - Aug. 22, 2011, 9:57 a.m.
On Mon, Aug 22, 2011 at 10:04 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Mon, Aug 22, 2011 at 10:58:48AM +0300, Dimitrios Apostolou wrote:
>> the future. I didn't like hashing addresses either, and I was
>> surprised I saw no regressions.
>
> Hashing on the expr address as well just results in smaller sharing
> in the hash table (i.e. if the expr has different address, but is considered
> equal).  The hashing of mem attrs is done just to reduce memory overhead.

And at some point the idea popped up to just dump this whole re-using
mem-attrs.  There is at most a single function in RTL but the whole program
is available in SSA, so the memory overhead must be small.

Richard.

>        Jakub
>
Jakub Jelinek - Aug. 22, 2011, 10:07 a.m.
On Mon, Aug 22, 2011 at 11:57:18AM +0200, Richard Guenther wrote:
> And at some point the idea popped up to just dump this whole re-using
> mem-attrs.  There is at most a single function in RTL but the whole program
> is available in SSA, so the memory overhead must be small.

Some functions are extremely large though.  Do you mean that MEM itself would be
enlarged to have the MEM_ATTRS field so that one operand is the address,
then expr, then HWI size, offset, etc.?  Because if the mem attrs aren't
shared any longer, it doesn't make sense to keep the indirection.
I still fear we have way too many MEMs in RTL that this would be noticeable.

	Jakub
Richard Guenther - Aug. 22, 2011, 10:34 a.m.
On Mon, Aug 22, 2011 at 12:07 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Mon, Aug 22, 2011 at 11:57:18AM +0200, Richard Guenther wrote:
>> And at some point the idea popped up to just dump this whole re-using
>> mem-attrs.  There is at most a single function in RTL but the whole program
>> is available in SSA, so the memory overhead must be small.
>
> Some functions are extremely large though.  Do you mean that MEM itself would be
> enlarged to have the MEM_ATTRS field so that one operand is the address,
> then expr, then HWI size, offset, etc.?  Because if the mem attrs aren't
> shared any longer, it doesn't make sense to keep the indirection.
> I still fear we have way too many MEMs in RTL that this would be noticeable.

It would be interesting to have numbers about the amount of sharing that
happens - might be not trivial though, as some re-uses would be able to
simply modify the attr inplace.

Richard.

>        Jakub
>
Michael Matz - Aug. 22, 2011, 3:46 p.m.
Hi,

On Mon, 22 Aug 2011, Richard Guenther wrote:

> > Some functions are extremely large though.  Do you mean that MEM 
> > itself would be enlarged to have the MEM_ATTRS field so that one 
> > operand is the address, then expr, then HWI size, offset, etc.? 
> >  Because if the mem attrs aren't shared any longer, it doesn't make 
> > sense to keep the indirection. I still fear we have way too many MEMs 
> > in RTL that this would be noticeable.
> 
> It would be interesting to have numbers about the amount of sharing that 
> happens

A pathetic amount.  From compiling cse.c combine.c tree.c and dwarf2out.c 
the top 10 users of MEMs per routine are:

With -O0:
MEMs  ATTR  name
970   485   combine_simplify_rtx
1011  464   simple_cst_equal
1047  612   mem_loc_descriptor
1173  442   walk_tree_1
1296  515   loc_list_from_tree
1431  690   simplify_comparison
1911  752   substitute_placeholder_in_expr
1951  745   substitute_in_expr
2503  1532  cse_insn
3242  2206  try_combine

With -O2:
MEMs  ATTR  name
514   502   gen_tagged_type_die
701   536   simplify_comparison
743   877   find_decls_types_r
851   840   dwarf2out_finish
863   784   loc_list_from_tree
916   839   combine_simplify_rtx
978   878   gen_subprogram_die
1650  1475  cse_insn
1720  1782  mem_loc_descriptor
2336  1792  try_combine

Summing doesn't make sense, but the routines with largest differences:
-O0
532 force_to_mode
547 simple_cst_equal
640 simplify_shift_const_1
731 walk_tree_1
741 simplify_comparison
781 loc_list_from_tree
971 cse_insn
1036 try_combine
1159 substitute_placeholder_in_expr
1206 substitute_in_expr

-O2
100 gen_subprogram_die
101 make_extraction
112 output_loc_sequence
122 if_then_else_cond
124 substitute_placeholder_in_expr
144 simplify_shift_const_1
165 simplify_comparison
175 cse_insn
205 simplify_if_then_else
544 try_combine

(Using -g or not doesn't make a difference).  I've counted all MEM rtx in 
the whole insn stream at finalization time (i.e. slightly less than 
potentially are actually generated during RTL passes).  ATTR is the number 
of unique mem_attrs ever created by set_mem_attrs, reset to zero at each 
function start (including emptying the htab).

That is, we save a whopping 48 kilobyte due to this fantastic hash table 
:-)  (offseted by the need for a pointer in the MEM rtx)

Just remove the whole thing.  Same for the reg_attrs hash table (I haven't 
measured that one, though).

> - might be not trivial though, as some re-uses would be able to 
> simply modify the attr inplace.


Ciao,
Michael.

Patch

=== modified file 'gcc/emit-rtl.c'
--- gcc/emit-rtl.c	2011-05-29 17:40:05 +0000

+++ gcc/emit-rtl.c	2011-08-21 04:44:25 +0000

@@ -256,11 +256,10 @@  mem_attrs_htab_hash (const void *x)

 {
   const mem_attrs *const p = (const mem_attrs *) x;
 
-  return (p->alias ^ (p->align * 1000)

-	  ^ (p->addrspace * 4000)

-	  ^ ((p->offset ? INTVAL (p->offset) : 0) * 50000)

-	  ^ ((p->size ? INTVAL (p->size) : 0) * 2500000)

-	  ^ (size_t) iterative_hash_expr (p->expr, 0));

+  /* By massively feeding the mem_attrs struct to iterative_hash() we

+     disregard the p->offset and p->size rtx, but in total the hash is

+     quick and good enough. */

+  return iterative_hash_object (*p, iterative_hash_expr (p->expr, 0));

 }
 
 /* Returns nonzero if the value represented by X (which is really a
@@ -5494,7 +5500,7 @@  init_emit_once (void)

   const_fixed_htab = htab_create_ggc (37, const_fixed_htab_hash,
 				      const_fixed_htab_eq, NULL);
 
-  mem_attrs_htab = htab_create_ggc (37, mem_attrs_htab_hash,

+  mem_attrs_htab = htab_create_ggc (128, mem_attrs_htab_hash,

 				    mem_attrs_htab_eq, NULL);
   reg_attrs_htab = htab_create_ggc (37, reg_attrs_htab_hash,
 				    reg_attrs_htab_eq, NULL);