Message ID | alpine.LNX.2.02.1108221027320.1374@localhost.localdomain |
---|---|
State | New |
Headers | show |
On Mon, Aug 22, 2011 at 10:32:35AM +0300, Dimitrios Apostolou wrote: > --- gcc/emit-rtl.c 2011-05-29 17:40:05 +0000 > +++ gcc/emit-rtl.c 2011-08-21 04:44:25 +0000 > @@ -256,11 +256,10 @@ mem_attrs_htab_hash (const void *x) > { > const mem_attrs *const p = (const mem_attrs *) x; > > - return (p->alias ^ (p->align * 1000) > - ^ (p->addrspace * 4000) > - ^ ((p->offset ? INTVAL (p->offset) : 0) * 50000) > - ^ ((p->size ? INTVAL (p->size) : 0) * 2500000) > - ^ (size_t) iterative_hash_expr (p->expr, 0)); > + /* By massively feeding the mem_attrs struct to iterative_hash() we > + disregard the p->offset and p->size rtx, but in total the hash is > + quick and good enough. */ > + return iterative_hash_object (*p, iterative_hash_expr (p->expr, 0)); > } > > /* Returns nonzero if the value represented by X (which is really a This patch isn't against the trunk, where p->offset and p->size aren't rtxes anymore, but HOST_WIDE_INTs. Furthermore, it is a bad idea to hash the p->expr address itself, it doesn't make any sense to hash on what p->expr points to in that case. And p->offset and p->size should be ignored if the *known_p corresponding fields are false. So, if you really think using iterative_hash_object is a win, it should be something like: mem_attrs q = *p; q.expr = NULL; if (!q.offset_known_p) q.offset = 0; if (!q.size_known_p) q.size = 0; return iterative_hash_object (q, iterative_hash_expr (p->expr, 0)); (or better yet avoid q.expr = NULL and instead start hashing from the next field after expr). Hashing the struct padding might not be a good idea either. Jakub
Hi Jakub, I forgot to mention that all patches are against mid-July trunk, I was hoping I'd have no conflicts. Anyway thanks for letting me know, if there are conflicts with my other patches please let me know, and I'll post an updated version at a later date. All your other concerns are valid and I'll try addressing them in the future. I didn't like hashing addresses either, and I was surprised I saw no regressions. Dimitris > > This patch isn't against the trunk, where p->offset and p->size aren't rtxes > anymore, but HOST_WIDE_INTs. Furthermore, it is a bad idea to hash > the p->expr address itself, it doesn't make any sense to hash on what > p->expr points to in that case. And p->offset and p->size should be ignored > if the *known_p corresponding fields are false. So, if you really think > using iterative_hash_object is a win, it should be something like: > mem_attrs q = *p; > q.expr = NULL; > if (!q.offset_known_p) q.offset = 0; > if (!q.size_known_p) q.size = 0; > return iterative_hash_object (q, iterative_hash_expr (p->expr, 0)); > (or better yet avoid q.expr = NULL and instead start hashing from the next > field after expr). Hashing the struct padding might not be a good idea > either. > > Jakub >
On Mon, Aug 22, 2011 at 10:58:48AM +0300, Dimitrios Apostolou wrote: > the future. I didn't like hashing addresses either, and I was > surprised I saw no regressions. Hashing on the expr address as well just results in smaller sharing in the hash table (i.e. if the expr has different address, but is considered equal). The hashing of mem attrs is done just to reduce memory overhead. Jakub
On Mon, Aug 22, 2011 at 10:04 AM, Jakub Jelinek <jakub@redhat.com> wrote: > On Mon, Aug 22, 2011 at 10:58:48AM +0300, Dimitrios Apostolou wrote: >> the future. I didn't like hashing addresses either, and I was >> surprised I saw no regressions. > > Hashing on the expr address as well just results in smaller sharing > in the hash table (i.e. if the expr has different address, but is considered > equal). The hashing of mem attrs is done just to reduce memory overhead. And at some point the idea popped up to just dump this whole re-using mem-attrs. There is at most a single function in RTL but the whole program is available in SSA, so the memory overhead must be small. Richard. > Jakub >
On Mon, Aug 22, 2011 at 11:57:18AM +0200, Richard Guenther wrote: > And at some point the idea popped up to just dump this whole re-using > mem-attrs. There is at most a single function in RTL but the whole program > is available in SSA, so the memory overhead must be small. Some functions are extremely large though. Do you mean that MEM itself would be enlarged to have the MEM_ATTRS field so that one operand is the address, then expr, then HWI size, offset, etc.? Because if the mem attrs aren't shared any longer, it doesn't make sense to keep the indirection. I still fear we have way too many MEMs in RTL that this would be noticeable. Jakub
On Mon, Aug 22, 2011 at 12:07 PM, Jakub Jelinek <jakub@redhat.com> wrote: > On Mon, Aug 22, 2011 at 11:57:18AM +0200, Richard Guenther wrote: >> And at some point the idea popped up to just dump this whole re-using >> mem-attrs. There is at most a single function in RTL but the whole program >> is available in SSA, so the memory overhead must be small. > > Some functions are extremely large though. Do you mean that MEM itself would be > enlarged to have the MEM_ATTRS field so that one operand is the address, > then expr, then HWI size, offset, etc.? Because if the mem attrs aren't > shared any longer, it doesn't make sense to keep the indirection. > I still fear we have way too many MEMs in RTL that this would be noticeable. It would be interesting to have numbers about the amount of sharing that happens - might be not trivial though, as some re-uses would be able to simply modify the attr inplace. Richard. > Jakub >
Hi, On Mon, 22 Aug 2011, Richard Guenther wrote: > > Some functions are extremely large though. Do you mean that MEM > > itself would be enlarged to have the MEM_ATTRS field so that one > > operand is the address, then expr, then HWI size, offset, etc.? > > Because if the mem attrs aren't shared any longer, it doesn't make > > sense to keep the indirection. I still fear we have way too many MEMs > > in RTL that this would be noticeable. > > It would be interesting to have numbers about the amount of sharing that > happens A pathetic amount. From compiling cse.c combine.c tree.c and dwarf2out.c the top 10 users of MEMs per routine are: With -O0: MEMs ATTR name 970 485 combine_simplify_rtx 1011 464 simple_cst_equal 1047 612 mem_loc_descriptor 1173 442 walk_tree_1 1296 515 loc_list_from_tree 1431 690 simplify_comparison 1911 752 substitute_placeholder_in_expr 1951 745 substitute_in_expr 2503 1532 cse_insn 3242 2206 try_combine With -O2: MEMs ATTR name 514 502 gen_tagged_type_die 701 536 simplify_comparison 743 877 find_decls_types_r 851 840 dwarf2out_finish 863 784 loc_list_from_tree 916 839 combine_simplify_rtx 978 878 gen_subprogram_die 1650 1475 cse_insn 1720 1782 mem_loc_descriptor 2336 1792 try_combine Summing doesn't make sense, but the routines with largest differences: -O0 532 force_to_mode 547 simple_cst_equal 640 simplify_shift_const_1 731 walk_tree_1 741 simplify_comparison 781 loc_list_from_tree 971 cse_insn 1036 try_combine 1159 substitute_placeholder_in_expr 1206 substitute_in_expr -O2 100 gen_subprogram_die 101 make_extraction 112 output_loc_sequence 122 if_then_else_cond 124 substitute_placeholder_in_expr 144 simplify_shift_const_1 165 simplify_comparison 175 cse_insn 205 simplify_if_then_else 544 try_combine (Using -g or not doesn't make a difference). I've counted all MEM rtx in the whole insn stream at finalization time (i.e. slightly less than potentially are actually generated during RTL passes). ATTR is the number of unique mem_attrs ever created by set_mem_attrs, reset to zero at each function start (including emptying the htab). That is, we save a whopping 48 kilobyte due to this fantastic hash table :-) (offseted by the need for a pointer in the MEM rtx) Just remove the whole thing. Same for the reg_attrs hash table (I haven't measured that one, though). > - might be not trivial though, as some re-uses would be able to > simply modify the attr inplace. Ciao, Michael.
=== modified file 'gcc/emit-rtl.c' --- gcc/emit-rtl.c 2011-05-29 17:40:05 +0000 +++ gcc/emit-rtl.c 2011-08-21 04:44:25 +0000 @@ -256,11 +256,10 @@ mem_attrs_htab_hash (const void *x) { const mem_attrs *const p = (const mem_attrs *) x; - return (p->alias ^ (p->align * 1000) - ^ (p->addrspace * 4000) - ^ ((p->offset ? INTVAL (p->offset) : 0) * 50000) - ^ ((p->size ? INTVAL (p->size) : 0) * 2500000) - ^ (size_t) iterative_hash_expr (p->expr, 0)); + /* By massively feeding the mem_attrs struct to iterative_hash() we + disregard the p->offset and p->size rtx, but in total the hash is + quick and good enough. */ + return iterative_hash_object (*p, iterative_hash_expr (p->expr, 0)); } /* Returns nonzero if the value represented by X (which is really a @@ -5494,7 +5500,7 @@ init_emit_once (void) const_fixed_htab = htab_create_ggc (37, const_fixed_htab_hash, const_fixed_htab_eq, NULL); - mem_attrs_htab = htab_create_ggc (37, mem_attrs_htab_hash, + mem_attrs_htab = htab_create_ggc (128, mem_attrs_htab_hash, mem_attrs_htab_eq, NULL); reg_attrs_htab = htab_create_ggc (37, reg_attrs_htab_hash, reg_attrs_htab_eq, NULL);