Patchwork Disable bounds checking for edge iterators in release compiler

login
register
mail settings
Submitter Jan Hubicka
Date June 8, 2010, 7:17 a.m.
Message ID <20100608071707.GB8913@kam.mff.cuni.cz>
Download mbox | patch
Permalink /patch/54943/
State New
Headers show

Comments

Jan Hubicka - June 8, 2010, 7:17 a.m.
Hi,
oprofile shows as one of highest sample count the following sanity check:
               :/* Advance the iterator to the next element.  */
               :static inline void
               :ei_next (edge_iterator *i)
               :{
  6578  0.4915 :  gcc_assert (i->index < EDGE_COUNT (ei_container (*i)));
   201  0.0150 :  i->index++;
               :}

I guess it is mostly cache miss that would be later attributed elsewhere, but
it also seems to me that this kind of bounds checking is something we want to
omit in release compiler (same was as we disable VEC range checking and such).
The probability that we will get useful error here is relatively low compared
to amount of checks we spread across compiler by inlining the function.

I tested the attached patch.  It reduce size of stripped cc1 binary (with WHOPR
build) from 11944192 to 11939616 bytes (4.5KB) and time needed to build cc1
binary from 10m29s to 10m19s. With non-WHOPR this should be more effective
since we are worse on optimizing out unnecesary checks, but I didn't verified.

Bootstrapped/regtested x86_64-linux. If this seems to make sense, I will look at the
other inline functions sanity checks.  I think in gimple.h we also have few quite
good candidates for ENABLE_CHECKING.

Honza

	* basic-block.h (single_succ_edge, single_pred_edge, ei_container,
	ei_next, ei_prev): Do sanity checking with ENABLE_CHECKING only.
Manuel López-Ibáñez - June 8, 2010, 9:23 a.m.
On 8 June 2010 09:17, Jan Hubicka <hubicka@ucw.cz> wrote:
> Hi,
> oprofile shows as one of highest sample count the following sanity check:
>               :/* Advance the iterator to the next element.  */
>               :static inline void
>               :ei_next (edge_iterator *i)
>               :{
>  6578  0.4915 :  gcc_assert (i->index < EDGE_COUNT (ei_container (*i)));
>   201  0.0150 :  i->index++;
>               :}
>
> I guess it is mostly cache miss that would be later attributed elsewhere, but
> it also seems to me that this kind of bounds checking is something we want to
> omit in release compiler (same was as we disable VEC range checking and such).
> The probability that we will get useful error here is relatively low compared
> to amount of checks we spread across compiler by inlining the function.

Why is not gcc_assert a NOP without ENABLE_CHECKING? I was pretty sure
it was and hence I was using it very liberally.

I wonder what the performance of the compiler would be in that case.

Cheers,

Manuel
Jakub Jelinek - June 8, 2010, 9:32 a.m.
On Tue, Jun 08, 2010 at 11:23:18AM +0200, Manuel López-Ibáñez wrote:
> On 8 June 2010 09:17, Jan Hubicka <hubicka@ucw.cz> wrote:
> > Hi,
> > oprofile shows as one of highest sample count the following sanity check:
> >               :/* Advance the iterator to the next element.  */
> >               :static inline void
> >               :ei_next (edge_iterator *i)
> >               :{
> >  6578  0.4915 :  gcc_assert (i->index < EDGE_COUNT (ei_container (*i)));
> >   201  0.0150 :  i->index++;
> >               :}
> >
> > I guess it is mostly cache miss that would be later attributed elsewhere, but
> > it also seems to me that this kind of bounds checking is something we want to
> > omit in release compiler (same was as we disable VEC range checking and such).
> > The probability that we will get useful error here is relatively low compared
> > to amount of checks we spread across compiler by inlining the function.
> 
> Why is not gcc_assert a NOP without ENABLE_CHECKING? I was pretty sure
> it was and hence I was using it very liberally.
> 
> I wonder what the performance of the compiler would be in that case.

Because we have --enable-checking=release and --disable-checking.
--enable-checking=release keeps gcc_asserts around, --disable-checking
optimizes them out.
The reason we want inexpensive asserts around is it is preferrable to crash
the compiler over silently generating wrong code.
More expensive asserts (or ones on very hot paths) should be
guarded with #ifdef ENABLE_CHECKING.

	Jakub
Richard Guenther - June 8, 2010, 9:48 a.m.
On Tue, Jun 8, 2010 at 9:17 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
> Hi,
> oprofile shows as one of highest sample count the following sanity check:
>               :/* Advance the iterator to the next element.  */
>               :static inline void
>               :ei_next (edge_iterator *i)
>               :{
>  6578  0.4915 :  gcc_assert (i->index < EDGE_COUNT (ei_container (*i)));
>   201  0.0150 :  i->index++;
>               :}
>
> I guess it is mostly cache miss that would be later attributed elsewhere, but
> it also seems to me that this kind of bounds checking is something we want to
> omit in release compiler (same was as we disable VEC range checking and such).
> The probability that we will get useful error here is relatively low compared
> to amount of checks we spread across compiler by inlining the function.
>
> I tested the attached patch.  It reduce size of stripped cc1 binary (with WHOPR
> build) from 11944192 to 11939616 bytes (4.5KB) and time needed to build cc1
> binary from 10m29s to 10m19s. With non-WHOPR this should be more effective
> since we are worse on optimizing out unnecesary checks, but I didn't verified.
>
> Bootstrapped/regtested x86_64-linux. If this seems to make sense, I will look at the
> other inline functions sanity checks.  I think in gimple.h we also have few quite
> good candidates for ENABLE_CHECKING.

I thought I've gone over gimple.h at some point ...

Ok.

Thanks,
Richard.

> Honza
>
>        * basic-block.h (single_succ_edge, single_pred_edge, ei_container,
>        ei_next, ei_prev): Do sanity checking with ENABLE_CHECKING only.
> Index: basic-block.h
> ===================================================================
> --- basic-block.h       (revision 160382)
> +++ basic-block.h       (working copy)
> @@ -554,7 +554,9 @@ single_pred_p (const_basic_block bb)
>  static inline edge
>  single_succ_edge (const_basic_block bb)
>  {
> +#ifdef ENABLE_CHECKING
>   gcc_assert (single_succ_p (bb));
> +#endif
>   return EDGE_SUCC (bb, 0);
>  }
>
> @@ -564,7 +566,9 @@ single_succ_edge (const_basic_block bb)
>  static inline edge
>  single_pred_edge (const_basic_block bb)
>  {
> +#ifdef ENABLE_CHECKING
>   gcc_assert (single_pred_p (bb));
> +#endif
>   return EDGE_PRED (bb, 0);
>  }
>
> @@ -596,7 +600,9 @@ typedef struct {
>  static inline VEC(edge,gc) *
>  ei_container (edge_iterator i)
>  {
> +#ifdef ENABLE_CHECKING
>   gcc_assert (i.container);
> +#endif
>   return *i.container;
>  }
>
> @@ -647,7 +653,9 @@ ei_one_before_end_p (edge_iterator i)
>  static inline void
>  ei_next (edge_iterator *i)
>  {
> +#ifdef ENABLE_CHECKING
>   gcc_assert (i->index < EDGE_COUNT (ei_container (*i)));
> +#endif
>   i->index++;
>  }
>
> @@ -655,7 +663,9 @@ ei_next (edge_iterator *i)
>  static inline void
>  ei_prev (edge_iterator *i)
>  {
> +#ifdef ENABLE_CHECKING
>   gcc_assert (i->index > 0);
> +#endif
>   i->index--;
>  }
>
>
Manuel López-Ibáñez - June 8, 2010, 10:37 a.m.
On 8 June 2010 11:32, Jakub Jelinek <jakub@redhat.com> wrote:
> On Tue, Jun 08, 2010 at 11:23:18AM +0200, Manuel López-Ibáñez wrote:
>> On 8 June 2010 09:17, Jan Hubicka <hubicka@ucw.cz> wrote:
>> > Hi,
>> > oprofile shows as one of highest sample count the following sanity check:
>> >               :/* Advance the iterator to the next element.  */
>> >               :static inline void
>> >               :ei_next (edge_iterator *i)
>> >               :{
>> >  6578  0.4915 :  gcc_assert (i->index < EDGE_COUNT (ei_container (*i)));
>> >   201  0.0150 :  i->index++;
>> >               :}
>> >
>> > I guess it is mostly cache miss that would be later attributed elsewhere, but
>> > it also seems to me that this kind of bounds checking is something we want to
>> > omit in release compiler (same was as we disable VEC range checking and such).
>> > The probability that we will get useful error here is relatively low compared
>> > to amount of checks we spread across compiler by inlining the function.
>>
>> Why is not gcc_assert a NOP without ENABLE_CHECKING? I was pretty sure
>> it was and hence I was using it very liberally.
>>
>> I wonder what the performance of the compiler would be in that case.
>
> Because we have --enable-checking=release and --disable-checking.
> --enable-checking=release keeps gcc_asserts around, --disable-checking
> optimizes them out.
> The reason we want inexpensive asserts around is it is preferrable to crash
> the compiler over silently generating wrong code.
> More expensive asserts (or ones on very hot paths) should be
> guarded with #ifdef ENABLE_CHECKING.

Good to know. Thanks for the explanation!

Manuel.
Jan Hubicka - June 8, 2010, 8:04 p.m.
> 
> I thought I've gone over gimple.h at some point ...
> 
those are static counts of .h asserts surviving to .optimize dumps with -O3 WHOPR build
      9   fancy_abort (&"../../gcc/tree-chrec.h"[0], 143, &__FUNCTION__[0]);
      9   fancy_abort (&"../../gcc/tree-chrec.h"[0], 147, &__FUNCTION__[0]);
      9   fancy_abort (&"../../gcc/tree-chrec.h"[0], 149, &__FUNCTION__[0]);
      9   fancy_abort (&"../../gcc/tree-flow-inline.h"[0], 51, &__FUNCTION__[0]);
     10   fancy_abort (&"../../gcc/gimple.h"[0], 3218, &__FUNCTION__[0]);
     12   fancy_abort (&"../../gcc/gimple.h"[0], 2711, &__FUNCTION__[0]);
     12   fancy_abort (&"../../gcc/sel-sched-ir.h"[0], 1206, &__FUNCTION__[0]);
     12   fancy_abort (&"../../gcc/tree-flow-inline.h"[0], 144, &__FUNCTION__[0]);
     13   fancy_abort (&"../../gcc/gimple.h"[0], 2678, &__FUNCTION__[0]);
     13   fancy_abort (&"../../gcc/sel-sched-ir.h"[0], 1331, &__FUNCTION__[0]);
     13   fancy_abort (&"../../gcc/sel-sched-ir.h"[0], 1332, &__FUNCTION__[0]);
     13   fancy_abort (&"../../gcc/tree-flow-inline.h"[0], 729, &__FUNCTION__[0]);
     14   fancy_abort (&"../../gcc/gimple.h"[0], 3271, &__FUNCTION__[0]);
     14   fancy_abort (&"../../gcc/sel-sched-ir.h"[0], 1067, &__FUNCTION__[0]);
     14   fancy_abort (&"../../gcc/sel-sched-ir.h"[0], 1127, &__FUNCTION__[0]);
     20   fancy_abort (&"../../gcc/graphite-poly.h"[0], 1053, &__FUNCTION__[0]);
     20   fancy_abort (&"../../gcc/graphite-poly.h"[0], 608, &__FUNCTION__[0]);
     21   fancy_abort (&"../../gcc/tree-vectorizer.h"[0], 644, &__FUNCTION__[0]);
     28   fancy_abort (&"../../gcc/sel-sched-ir.h"[0], 1052, &__FUNCTION__[0]);
     29   fancy_abort (&"../../gcc/basic-block.h"[0], 567, &__FUNCTION__[0]);
     93   fancy_abort (&"../../gcc/basic-block.h"[0], 557, &__FUNCTION__[0]);
     93   fancy_abort (&"../../gcc/basic-block.h"[0], 599, &__FUNCTION__[0]);
    133   fancy_abort (&"../../gcc/gimple.h"[0], 1470, &__FUNCTION__[0]);
    162   fancy_abort (&"../../gcc/gimple.h"[0], 1686, &__FUNCTION__[0]);
    178   fancy_abort (&"../../gcc/tree-flow-inline.h"[0], 510, &__FUNCTION__[0]);
    234   fancy_abort (&"../../gcc/basic-block.h"[0], 650, &__FUNCTION__[0]);
    240   fancy_abort (&"../../gcc/tree-vectorizer.h"[0], 634, &__FUNCTION__[0]);
    250   fancy_abort (&"../../gcc/gimple.h"[0], 3159, &__FUNCTION__[0]);
    685   fancy_abort (&"../../gcc/gimple.h"[0], 1643, &__FUNCTION__[0]);

so looks like gimple.h is major offender even if it did not show in my profile.

My current oprofile is as follows:
27416     4.2304  htab_find_slot_with_hash
11987     1.8496  bitmap_ior_into
8987      1.3867  bitmap_set_bit_1
8347      1.2880  df_note_compute.40921.3735
8034      1.2397  df_worklist_dataflow
7338      1.1323  fast_dce.463781.constprop.768
7208      1.1122  lto_get_section_data
6815      1.0516  bitmap_clear_bit
6762      1.0434  bitmap_and
6146      0.9484  bitmap_copy
5993      0.9247  bitmap_ior_and_compl
5414      0.8354  ggc_alloc_stat
5248      0.8098  lto_input_tree
5123      0.7905  cleanup_cfg
5083      0.7843  record_reg_classes.102605.constprop.811.3366
5059      0.7806  htab_find_with_hash
4763      0.7349  bitmap_and_into
4650      0.7175  walk_tree_1.constprop.778
4597      0.7093  bitmap_bit_p_1
4593      0.7087  inverted_post_order_compute
4567      0.7047  lto_output_tree
4473      0.6902  extract_insn
4305      0.6643  execute_function_todo.149820.4272
4222      0.6515  htab_expand.538910
3946      0.6089  pool_alloc
3804      0.5870  post_order_compute
3785      0.5840  lto_output_1_stream
3730      0.5756  sorted_array_from_bitmap_set.307068.4314
3724      0.5746  constrain_operands


This is still with inlined bitmap operations that now seem to help perofrmance, I will
get more clear numbers on that.  Also have some optimizations on the bitmap_ior_into
and friends that however does not show much speedups overall since they are cache
bound (perhaps doing rest of obstack changes will help this)

fast_dce seems to be replaceable with ud-dce that has better worst case
scenario (fast_dce is calling iterative dataflow several times, so part of the
dataflow slowness is comming from that. I want to analyze what really makes it
to iterate so many times on GCC and what can be done about it).  I am still not
sure what to do about df notes - those are mostly bitmap operations

htab_find_slot_with_hash is I think combination of type merging and var tracking,
but I did not look terribly deep into it.

Honza
Richard Guenther - June 9, 2010, 9:47 a.m.
On Tue, Jun 8, 2010 at 10:04 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>
>> I thought I've gone over gimple.h at some point ...
>>
> those are static counts of .h asserts surviving to .optimize dumps with -O3 WHOPR build
>      9   fancy_abort (&"../../gcc/tree-chrec.h"[0], 143, &__FUNCTION__[0]);
>      9   fancy_abort (&"../../gcc/tree-chrec.h"[0], 147, &__FUNCTION__[0]);
>      9   fancy_abort (&"../../gcc/tree-chrec.h"[0], 149, &__FUNCTION__[0]);
>      9   fancy_abort (&"../../gcc/tree-flow-inline.h"[0], 51, &__FUNCTION__[0]);
>     10   fancy_abort (&"../../gcc/gimple.h"[0], 3218, &__FUNCTION__[0]);
>     12   fancy_abort (&"../../gcc/gimple.h"[0], 2711, &__FUNCTION__[0]);
>     12   fancy_abort (&"../../gcc/sel-sched-ir.h"[0], 1206, &__FUNCTION__[0]);
>     12   fancy_abort (&"../../gcc/tree-flow-inline.h"[0], 144, &__FUNCTION__[0]);
>     13   fancy_abort (&"../../gcc/gimple.h"[0], 2678, &__FUNCTION__[0]);
>     13   fancy_abort (&"../../gcc/sel-sched-ir.h"[0], 1331, &__FUNCTION__[0]);
>     13   fancy_abort (&"../../gcc/sel-sched-ir.h"[0], 1332, &__FUNCTION__[0]);
>     13   fancy_abort (&"../../gcc/tree-flow-inline.h"[0], 729, &__FUNCTION__[0]);
>     14   fancy_abort (&"../../gcc/gimple.h"[0], 3271, &__FUNCTION__[0]);
>     14   fancy_abort (&"../../gcc/sel-sched-ir.h"[0], 1067, &__FUNCTION__[0]);
>     14   fancy_abort (&"../../gcc/sel-sched-ir.h"[0], 1127, &__FUNCTION__[0]);
>     20   fancy_abort (&"../../gcc/graphite-poly.h"[0], 1053, &__FUNCTION__[0]);
>     20   fancy_abort (&"../../gcc/graphite-poly.h"[0], 608, &__FUNCTION__[0]);
>     21   fancy_abort (&"../../gcc/tree-vectorizer.h"[0], 644, &__FUNCTION__[0]);
>     28   fancy_abort (&"../../gcc/sel-sched-ir.h"[0], 1052, &__FUNCTION__[0]);
>     29   fancy_abort (&"../../gcc/basic-block.h"[0], 567, &__FUNCTION__[0]);
>     93   fancy_abort (&"../../gcc/basic-block.h"[0], 557, &__FUNCTION__[0]);
>     93   fancy_abort (&"../../gcc/basic-block.h"[0], 599, &__FUNCTION__[0]);
>    133   fancy_abort (&"../../gcc/gimple.h"[0], 1470, &__FUNCTION__[0]);
>    162   fancy_abort (&"../../gcc/gimple.h"[0], 1686, &__FUNCTION__[0]);
>    178   fancy_abort (&"../../gcc/tree-flow-inline.h"[0], 510, &__FUNCTION__[0]);
>    234   fancy_abort (&"../../gcc/basic-block.h"[0], 650, &__FUNCTION__[0]);
>    240   fancy_abort (&"../../gcc/tree-vectorizer.h"[0], 634, &__FUNCTION__[0]);
>    250   fancy_abort (&"../../gcc/gimple.h"[0], 3159, &__FUNCTION__[0]);
>    685   fancy_abort (&"../../gcc/gimple.h"[0], 1643, &__FUNCTION__[0]);
>
> so looks like gimple.h is major offender even if it did not show in my profile.
>
> My current oprofile is as follows:
> 27416     4.2304  htab_find_slot_with_hash
> 11987     1.8496  bitmap_ior_into
> 8987      1.3867  bitmap_set_bit_1
> 8347      1.2880  df_note_compute.40921.3735
> 8034      1.2397  df_worklist_dataflow
> 7338      1.1323  fast_dce.463781.constprop.768
> 7208      1.1122  lto_get_section_data
> 6815      1.0516  bitmap_clear_bit
> 6762      1.0434  bitmap_and
> 6146      0.9484  bitmap_copy
> 5993      0.9247  bitmap_ior_and_compl
> 5414      0.8354  ggc_alloc_stat
> 5248      0.8098  lto_input_tree
> 5123      0.7905  cleanup_cfg
> 5083      0.7843  record_reg_classes.102605.constprop.811.3366
> 5059      0.7806  htab_find_with_hash
> 4763      0.7349  bitmap_and_into
> 4650      0.7175  walk_tree_1.constprop.778
> 4597      0.7093  bitmap_bit_p_1
> 4593      0.7087  inverted_post_order_compute
> 4567      0.7047  lto_output_tree
> 4473      0.6902  extract_insn
> 4305      0.6643  execute_function_todo.149820.4272
> 4222      0.6515  htab_expand.538910
> 3946      0.6089  pool_alloc
> 3804      0.5870  post_order_compute
> 3785      0.5840  lto_output_1_stream
> 3730      0.5756  sorted_array_from_bitmap_set.307068.4314
> 3724      0.5746  constrain_operands
>
>
> This is still with inlined bitmap operations that now seem to help perofrmance, I will
> get more clear numbers on that.  Also have some optimizations on the bitmap_ior_into
> and friends that however does not show much speedups overall since they are cache
> bound (perhaps doing rest of obstack changes will help this)
>
> fast_dce seems to be replaceable with ud-dce that has better worst case
> scenario (fast_dce is calling iterative dataflow several times, so part of the
> dataflow slowness is comming from that. I want to analyze what really makes it
> to iterate so many times on GCC and what can be done about it).  I am still not
> sure what to do about df notes - those are mostly bitmap operations
>
> htab_find_slot_with_hash is I think combination of type merging and var tracking,
> but I did not look terribly deep into it.

For the type merging I have a patch, but I will delay it until next week.
I posted it and there's a i386 backend fix in it, so if you want ot have
a look....

Richard.

> Honza
>

Patch

Index: basic-block.h
===================================================================
--- basic-block.h	(revision 160382)
+++ basic-block.h	(working copy)
@@ -554,7 +554,9 @@  single_pred_p (const_basic_block bb)
 static inline edge
 single_succ_edge (const_basic_block bb)
 {
+#ifdef ENABLE_CHECKING
   gcc_assert (single_succ_p (bb));
+#endif
   return EDGE_SUCC (bb, 0);
 }
 
@@ -564,7 +566,9 @@  single_succ_edge (const_basic_block bb)
 static inline edge
 single_pred_edge (const_basic_block bb)
 {
+#ifdef ENABLE_CHECKING
   gcc_assert (single_pred_p (bb));
+#endif
   return EDGE_PRED (bb, 0);
 }
 
@@ -596,7 +600,9 @@  typedef struct {
 static inline VEC(edge,gc) *
 ei_container (edge_iterator i)
 {
+#ifdef ENABLE_CHECKING
   gcc_assert (i.container);
+#endif
   return *i.container;
 }
 
@@ -647,7 +653,9 @@  ei_one_before_end_p (edge_iterator i)
 static inline void
 ei_next (edge_iterator *i)
 {
+#ifdef ENABLE_CHECKING
   gcc_assert (i->index < EDGE_COUNT (ei_container (*i)));
+#endif
   i->index++;
 }
 
@@ -655,7 +663,9 @@  ei_next (edge_iterator *i)
 static inline void
 ei_prev (edge_iterator *i)
 {
+#ifdef ENABLE_CHECKING
   gcc_assert (i->index > 0);
+#endif
   i->index--;
 }