Do not account __builtin_unreachable guards in inliner

Message ID	ZJAJMPiUB4oxqZBR@kam.mff.cuni.cz
State	New
Headers	show Return-Path: <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org> DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 690FA3858D28 Date: Mon, 19 Jun 2023 09:52:16 +0200 To: gcc-patches@gcc.gnu.org Subject: Do not account __builtin_unreachable guards in inliner Message-ID: <ZJAJMPiUB4oxqZBR@kam.mff.cuni.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Precedence: list From: Jan Hubicka via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: Jan Hubicka <hubicka@ucw.cz> Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org>
Series	Do not account __builtin_unreachable guards in inliner \| expand Do not account __builtin_unreachable guards in inliner

Jan Hubicka June 19, 2023, 7:52 a.m. UTC

Hi,
this was suggested earlier somewhere, but I can not find the thread.
C++ has assume attribute that expands int
  if (conditional)
    __builtin_unreachable ()
We do not want to account the conditional in inline heuristics since
we know that it is going to be optimized out.

Bootstrapped/regtested x86_64-linux, will commit it later today if
thre are no complains.

gcc/ChangeLog:

	* ipa-fnsummary.cc (builtin_unreachable_bb_p): New function.
	(analyze_function_body): Do not account conditionals guarding
	builtin_unreachable calls.

gcc/testsuite/ChangeLog:

	* gcc.dg/ipa/fnsummary-1.c: New test.

Richard Biener June 19, 2023, 9:01 a.m. UTC | #1

On Mon, Jun 19, 2023 at 9:52 AM Jan Hubicka via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hi,
> this was suggested earlier somewhere, but I can not find the thread.
> C++ has assume attribute that expands int
>   if (conditional)
>     __builtin_unreachable ()
> We do not want to account the conditional in inline heuristics since
> we know that it is going to be optimized out.
>
> Bootstrapped/regtested x86_64-linux, will commit it later today if
> thre are no complains.

I think we also had the request to not account the condition feeding
stmts (if they only feed it and have no side-effects).  libstdc++ has
complex range comparisons here.  Also ...

> gcc/ChangeLog:
>
>         * ipa-fnsummary.cc (builtin_unreachable_bb_p): New function.
>         (analyze_function_body): Do not account conditionals guarding
>         builtin_unreachable calls.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.dg/ipa/fnsummary-1.c: New test.
>
> diff --git a/gcc/ipa-fnsummary.cc b/gcc/ipa-fnsummary.cc
> index a5f5a50c8a5..987da29ec34 100644
> --- a/gcc/ipa-fnsummary.cc
> +++ b/gcc/ipa-fnsummary.cc
> @@ -2649,6 +2649,54 @@ points_to_possible_sra_candidate_p (tree t)
>    return false;
>  }
>
> +/* Return true if BB is builtin_unreachable.
> +   We skip empty basic blocks, debug statements, clobbers and predicts.
> +   CACHE is used to memoize already analyzed blocks.  */
> +
> +static bool
> +builtin_unreachable_bb_p (basic_block bb, vec<unsigned char> &cache)
> +{
> +  if (cache[bb->index])
> +    return cache[bb->index] - 1;
> +  gimple_stmt_iterator si;
> +  auto_vec <basic_block, 4> visited_bbs;
> +  bool ret = false;
> +  while (true)
> +    {
> +      bool empty_bb = true;
> +      visited_bbs.safe_push (bb);
> +      cache[bb->index] = 3;
> +      for (si = gsi_start_nondebug_bb (bb);
> +          !gsi_end_p (si) && empty_bb;
> +          gsi_next_nondebug (&si))
> +       {
> +         if (gimple_code (gsi_stmt (si)) != GIMPLE_PREDICT
> +             && !gimple_clobber_p (gsi_stmt (si))
> +             && !gimple_nop_p (gsi_stmt (si)))
> +           {
> +             empty_bb = false;
> +             break;
> +           }
> +       }
> +      if (!empty_bb)
> +       break;
> +      else
> +       bb = single_succ_edge (bb)->dest;
> +      if (cache[bb->index])
> +       {
> +         ret = cache[bb->index] == 3 ? false : cache[bb->index] - 1;
> +         goto done;
> +       }
> +    }
> +  if (gimple_call_builtin_p (gsi_stmt (si), BUILT_IN_UNREACHABLE)
> +      || gimple_call_builtin_p (gsi_stmt (si), BUILT_IN_UNREACHABLE_TRAP))

... we do code generate BUILT_IN_UNREACHABLE_TRAP, no?

> +    ret = true;
> +done:
> +  for (basic_block vbb:visited_bbs)
> +    cache[vbb->index] = (unsigned char)ret + 1;
> +  return ret;
> +}
> +
>  /* Analyze function body for NODE.
>     EARLY indicates run from early optimization pipeline.  */
>
> @@ -2743,6 +2791,8 @@ analyze_function_body (struct cgraph_node *node, bool early)
>    const profile_count entry_count = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count;
>    order = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
>    nblocks = pre_and_rev_post_order_compute (NULL, order, false);
> +  auto_vec<unsigned char, 10> cache;
> +  cache.safe_grow_cleared (last_basic_block_for_fn (cfun));

A sbitmap with two bits per entry would be more space efficient here.  bitmap
has bitmap_set_aligned_chunk and bitmap_get_aligned_chunk for convenience,
adding the corresponding to sbitmap.h would likely ease use there as well.

>    for (n = 0; n < nblocks; n++)
>      {
>        bb = BASIC_BLOCK_FOR_FN (cfun, order[n]);
> @@ -2901,6 +2951,24 @@ analyze_function_body (struct cgraph_node *node, bool early)
>                 }
>             }
>
> +         /* Conditionals guarding __builtin_unreachable will be
> +            optimized out.  */
> +         if (gimple_code (stmt) == GIMPLE_COND)
> +           {
> +             edge_iterator ei;
> +             edge e;
> +             FOR_EACH_EDGE (e, ei, bb->succs)
> +               if (builtin_unreachable_bb_p (e->dest, cache))
> +                 {
> +                   if (dump_file)
> +                     fprintf (dump_file,
> +                              "\t\tConditional guarding __builtin_unreachable; ignored\n");
> +                   this_time = 0;
> +                   this_size = 0;
> +                   break;
> +                 }
> +           }
> +
>           /* TODO: When conditional jump or switch is known to be constant, but
>              we did not translate it into the predicates, we really can account
>              just maximum of the possible paths.  */
> diff --git a/gcc/testsuite/gcc.dg/ipa/fnsummary-1.c b/gcc/testsuite/gcc.dg/ipa/fnsummary-1.c
> new file mode 100644
> index 00000000000..a0ece0c300b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/ipa/fnsummary-1.c
> @@ -0,0 +1,8 @@
> +/* { dg-options "-O2 -fdump-ipa-fnsummary"  } */
> +int
> +test(int a)
> +{
> +       if (a > 10)
> +               __builtin_unreachable ();
> +}
> +/* { dg-final { scan-ipa-dump "Conditional guarding __builtin_unreachable" "fnsummary"  } } */

Jan Hubicka June 19, 2023, 10:15 a.m. UTC | #2

> On Mon, Jun 19, 2023 at 9:52 AM Jan Hubicka via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > Hi,
> > this was suggested earlier somewhere, but I can not find the thread.
> > C++ has assume attribute that expands int
> >   if (conditional)
> >     __builtin_unreachable ()
> > We do not want to account the conditional in inline heuristics since
> > we know that it is going to be optimized out.
> >
> > Bootstrapped/regtested x86_64-linux, will commit it later today if
> > thre are no complains.
> 
> I think we also had the request to not account the condition feeding
> stmts (if they only feed it and have no side-effects).  libstdc++ has
> complex range comparisons here.  Also ...

I was thinking of this: it depends on how smart do we want to get.
We also have dead conditionals guarding clobbers, predicts and other
stuff.  In general we can use mark phase of cd-dce telling it to ignore
those statements and then use its resut in the analysis.

Also question is how much we can rely on middle-end optimizing out
unreachables.  For example:
int a;
int b[3];
test()
{
  if (a > 0)
    {
      for (int i = 0; i < 3; i++)
	  b[i] = 0;
      __builtin_unreachable ();
    }
}

IMO can be optimized to empty function.  I believe we used to have code
in tree-cfgcleanup to remove statements just before
__builtin_unreachable which can not terminate execution, but perhaps it
existed only in my local tree?
We could also perhaps declare unreachable NOVOPs which would make DSE to
remove the stores.

> 
> ... we do code generate BUILT_IN_UNREACHABLE_TRAP, no?

You are right.  I tested it with -funreachable-traps but it does not do
what I expected, I need -fsanitize=unreachable -fsanitize-trap=unreachable

Also if I try to call it by hand I get:

jan@localhost:/tmp> gcc t.c -S -O2 -funreachable-traps -fdump-tree-all-details -fsanitize=unreachable -fsanitize-trap=unreachable
t.c: In function ‘test’:
t.c:9:13: warning: implicit declaration of function ‘__builtin_unreachable_trap’; did you mean ‘__builtin_unreachable trap’? [-Wimplicit-function-declaration]
    9 |             __builtin_unreachable_trap ();
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
      |             __builtin_unreachable trap

Which is not as helpful as it is trying to be.
> 
> > +    ret = true;
> > +done:
> > +  for (basic_block vbb:visited_bbs)
> > +    cache[vbb->index] = (unsigned char)ret + 1;
> > +  return ret;
> > +}
> > +
> >  /* Analyze function body for NODE.
> >     EARLY indicates run from early optimization pipeline.  */
> >
> > @@ -2743,6 +2791,8 @@ analyze_function_body (struct cgraph_node *node, bool early)
> >    const profile_count entry_count = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count;
> >    order = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
> >    nblocks = pre_and_rev_post_order_compute (NULL, order, false);
> > +  auto_vec<unsigned char, 10> cache;
> > +  cache.safe_grow_cleared (last_basic_block_for_fn (cfun));
> 
> A sbitmap with two bits per entry would be more space efficient here.  bitmap
> has bitmap_set_aligned_chunk and bitmap_get_aligned_chunk for convenience,
> adding the corresponding to sbitmap.h would likely ease use there as well.

I did not know about the chunk API which is certainly nice :)
sbitmap will always allocate, while here we stay on stack for small
functions and I am not sure how much extra bit operations would not
offset extra memset, but overall I think it is all in noise.

Honza

Richard Biener June 19, 2023, 11:30 a.m. UTC | #3

On Mon, Jun 19, 2023 at 12:15 PM Jan Hubicka <hubicka@ucw.cz> wrote:
>
> > On Mon, Jun 19, 2023 at 9:52 AM Jan Hubicka via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > Hi,
> > > this was suggested earlier somewhere, but I can not find the thread.
> > > C++ has assume attribute that expands int
> > >   if (conditional)
> > >     __builtin_unreachable ()
> > > We do not want to account the conditional in inline heuristics since
> > > we know that it is going to be optimized out.
> > >
> > > Bootstrapped/regtested x86_64-linux, will commit it later today if
> > > thre are no complains.
> >
> > I think we also had the request to not account the condition feeding
> > stmts (if they only feed it and have no side-effects).  libstdc++ has
> > complex range comparisons here.  Also ...
>
> I was thinking of this: it depends on how smart do we want to get.
> We also have dead conditionals guarding clobbers, predicts and other
> stuff.  In general we can use mark phase of cd-dce telling it to ignore
> those statements and then use its resut in the analysis.

Hmm, possible but a bit heavy-handed.  There's simple_dce_from_worklist
which might be a way to do this (of course we cannot use that 1:1).  Also
then consider

 a = a + 1;
 if (a > 10)
   __builtin_unreachable ();
 if (a < 5)
   __builtin_unreachable ();

and a has more than one use but both are going away.  So indeed a
more global analysis would be needed to get the full benefit.

> Also question is how much we can rely on middle-end optimizing out
> unreachables.  For example:
> int a;
> int b[3];
> test()
> {
>   if (a > 0)
>     {
>       for (int i = 0; i < 3; i++)
>           b[i] = 0;
>       __builtin_unreachable ();
>     }
> }
>
> IMO can be optimized to empty function.  I believe we used to have code
> in tree-cfgcleanup to remove statements just before
> __builtin_unreachable which can not terminate execution, but perhaps it
> existed only in my local tree?

I think we rely on DCE/DSE here and explicit unreachable () pruning after
VRP picked up things (I think it simply gets the secondary effect optimizing
the condition it created the range for in the first pass).

DSE is appearantly not able to kill the stores, I will fix that.  I
think DCE can,
but only for non-aliased stores.

> We could also perhaps declare unreachable NOVOPs which would make DSE to
> remove the stores.

But only because of a bug in DSE ... it also removes them if that
__builtin_unreachable ()
is GIMPLE_RESX.

> >
> > ... we do code generate BUILT_IN_UNREACHABLE_TRAP, no?
>
> You are right.  I tested it with -funreachable-traps but it does not do
> what I expected, I need -fsanitize=unreachable -fsanitize-trap=unreachable
>
> Also if I try to call it by hand I get:
>
> jan@localhost:/tmp> gcc t.c -S -O2 -funreachable-traps -fdump-tree-all-details -fsanitize=unreachable -fsanitize-trap=unreachable
> t.c: In function ‘test’:
> t.c:9:13: warning: implicit declaration of function ‘__builtin_unreachable_trap’; did you mean ‘__builtin_unreachable trap’? [-Wimplicit-function-declaration]
>     9 |             __builtin_unreachable_trap ();
>       |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
>       |             __builtin_unreachable trap
>
> Which is not as helpful as it is trying to be.
> >
> > > +    ret = true;
> > > +done:
> > > +  for (basic_block vbb:visited_bbs)
> > > +    cache[vbb->index] = (unsigned char)ret + 1;
> > > +  return ret;
> > > +}
> > > +
> > >  /* Analyze function body for NODE.
> > >     EARLY indicates run from early optimization pipeline.  */
> > >
> > > @@ -2743,6 +2791,8 @@ analyze_function_body (struct cgraph_node *node, bool early)
> > >    const profile_count entry_count = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count;
> > >    order = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
> > >    nblocks = pre_and_rev_post_order_compute (NULL, order, false);
> > > +  auto_vec<unsigned char, 10> cache;
> > > +  cache.safe_grow_cleared (last_basic_block_for_fn (cfun));
> >
> > A sbitmap with two bits per entry would be more space efficient here.  bitmap
> > has bitmap_set_aligned_chunk and bitmap_get_aligned_chunk for convenience,
> > adding the corresponding to sbitmap.h would likely ease use there as well.
>
> I did not know about the chunk API which is certainly nice :)
> sbitmap will always allocate, while here we stay on stack for small
> functions and I am not sure how much extra bit operations would not
> offset extra memset, but overall I think it is all in noise.

Ah, yeah.
> Honza

Richard Biener June 19, 2023, 11:40 a.m. UTC | #4

On Mon, Jun 19, 2023 at 1:30 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Mon, Jun 19, 2023 at 12:15 PM Jan Hubicka <hubicka@ucw.cz> wrote:
> >
> > > On Mon, Jun 19, 2023 at 9:52 AM Jan Hubicka via Gcc-patches
> > > <gcc-patches@gcc.gnu.org> wrote:
> > > >
> > > > Hi,
> > > > this was suggested earlier somewhere, but I can not find the thread.
> > > > C++ has assume attribute that expands int
> > > >   if (conditional)
> > > >     __builtin_unreachable ()
> > > > We do not want to account the conditional in inline heuristics since
> > > > we know that it is going to be optimized out.
> > > >
> > > > Bootstrapped/regtested x86_64-linux, will commit it later today if
> > > > thre are no complains.
> > >
> > > I think we also had the request to not account the condition feeding
> > > stmts (if they only feed it and have no side-effects).  libstdc++ has
> > > complex range comparisons here.  Also ...
> >
> > I was thinking of this: it depends on how smart do we want to get.
> > We also have dead conditionals guarding clobbers, predicts and other
> > stuff.  In general we can use mark phase of cd-dce telling it to ignore
> > those statements and then use its resut in the analysis.
>
> Hmm, possible but a bit heavy-handed.  There's simple_dce_from_worklist
> which might be a way to do this (of course we cannot use that 1:1).  Also
> then consider
>
>  a = a + 1;
>  if (a > 10)
>    __builtin_unreachable ();
>  if (a < 5)
>    __builtin_unreachable ();
>
> and a has more than one use but both are going away.  So indeed a
> more global analysis would be needed to get the full benefit.
>
> > Also question is how much we can rely on middle-end optimizing out
> > unreachables.  For example:
> > int a;
> > int b[3];
> > test()
> > {
> >   if (a > 0)
> >     {
> >       for (int i = 0; i < 3; i++)
> >           b[i] = 0;
> >       __builtin_unreachable ();
> >     }
> > }
> >
> > IMO can be optimized to empty function.  I believe we used to have code
> > in tree-cfgcleanup to remove statements just before
> > __builtin_unreachable which can not terminate execution, but perhaps it
> > existed only in my local tree?
>
> I think we rely on DCE/DSE here and explicit unreachable () pruning after
> VRP picked up things (I think it simply gets the secondary effect optimizing
> the condition it created the range for in the first pass).
>
> DSE is appearantly not able to kill the stores, I will fix that.  I
> think DCE can,
> but only for non-aliased stores.
>
> > We could also perhaps declare unreachable NOVOPs which would make DSE to
> > remove the stores.
>
> But only because of a bug in DSE ... it also removes them if that
> __builtin_unreachable ()
> is GIMPLE_RESX.

Oh, and __builtin_unreachable is already 'const' and thus without any VOPs.  The
issue in DSE is that DSE will not run into __builtin_unreachable because it has
no VOPs.  Instead DSE relies on eventually seeing a VUSE for all paths leaving
a function, it doesn't have a way to consider __builtin_unreachable killing all
memory (it would need a VDEF for that).

It might be possible to record which virtual operands are live at BBs without
successors (but the VUSE on returns was an attempt to avoid the need for that).

So there's no easy way to fix DSE here.

> > >
> > > ... we do code generate BUILT_IN_UNREACHABLE_TRAP, no?
> >
> > You are right.  I tested it with -funreachable-traps but it does not do
> > what I expected, I need -fsanitize=unreachable -fsanitize-trap=unreachable
> >
> > Also if I try to call it by hand I get:
> >
> > jan@localhost:/tmp> gcc t.c -S -O2 -funreachable-traps -fdump-tree-all-details -fsanitize=unreachable -fsanitize-trap=unreachable
> > t.c: In function ‘test’:
> > t.c:9:13: warning: implicit declaration of function ‘__builtin_unreachable_trap’; did you mean ‘__builtin_unreachable trap’? [-Wimplicit-function-declaration]
> >     9 |             __builtin_unreachable_trap ();
> >       |             ^~~~~~~~~~~~~~~~~~~~~~~~~~
> >       |             __builtin_unreachable trap
> >
> > Which is not as helpful as it is trying to be.
> > >
> > > > +    ret = true;
> > > > +done:
> > > > +  for (basic_block vbb:visited_bbs)
> > > > +    cache[vbb->index] = (unsigned char)ret + 1;
> > > > +  return ret;
> > > > +}
> > > > +
> > > >  /* Analyze function body for NODE.
> > > >     EARLY indicates run from early optimization pipeline.  */
> > > >
> > > > @@ -2743,6 +2791,8 @@ analyze_function_body (struct cgraph_node *node, bool early)
> > > >    const profile_count entry_count = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count;
> > > >    order = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
> > > >    nblocks = pre_and_rev_post_order_compute (NULL, order, false);
> > > > +  auto_vec<unsigned char, 10> cache;
> > > > +  cache.safe_grow_cleared (last_basic_block_for_fn (cfun));
> > >
> > > A sbitmap with two bits per entry would be more space efficient here.  bitmap
> > > has bitmap_set_aligned_chunk and bitmap_get_aligned_chunk for convenience,
> > > adding the corresponding to sbitmap.h would likely ease use there as well.
> >
> > I did not know about the chunk API which is certainly nice :)
> > sbitmap will always allocate, while here we stay on stack for small
> > functions and I am not sure how much extra bit operations would not
> > offset extra memset, but overall I think it is all in noise.
>
> Ah, yeah.
> > Honza

Jan Hubicka June 23, 2023, 10:11 a.m. UTC | #5

> On Mon, Jun 19, 2023 at 12:15 PM Jan Hubicka <hubicka@ucw.cz> wrote:
> >
> > > On Mon, Jun 19, 2023 at 9:52 AM Jan Hubicka via Gcc-patches
> > > <gcc-patches@gcc.gnu.org> wrote:
> > > >
> > > > Hi,
> > > > this was suggested earlier somewhere, but I can not find the thread.
> > > > C++ has assume attribute that expands int
> > > >   if (conditional)
> > > >     __builtin_unreachable ()
> > > > We do not want to account the conditional in inline heuristics since
> > > > we know that it is going to be optimized out.
> > > >
> > > > Bootstrapped/regtested x86_64-linux, will commit it later today if
> > > > thre are no complains.
> > >
> > > I think we also had the request to not account the condition feeding
> > > stmts (if they only feed it and have no side-effects).  libstdc++ has
> > > complex range comparisons here.  Also ...
> >
> > I was thinking of this: it depends on how smart do we want to get.
> > We also have dead conditionals guarding clobbers, predicts and other
> > stuff.  In general we can use mark phase of cd-dce telling it to ignore
> > those statements and then use its resut in the analysis.
> 
> Hmm, possible but a bit heavy-handed.  There's simple_dce_from_worklist
> which might be a way to do this (of course we cannot use that 1:1).  Also
> then consider
> 
>  a = a + 1;
>  if (a > 10)
>    __builtin_unreachable ();
>  if (a < 5)
>    __builtin_unreachable ();
> 
> and a has more than one use but both are going away.  So indeed a
> more global analysis would be needed to get the full benefit.

I was looking into simple_dce_from_worklist and if I understand it
right, it simply walks list of SSA names which probably lost some uses
by the consuming pass. If they have zero non-debug uses and defining statement has
no side effects, then they are removed.

I think this is not really fitting the bill here since the example above
is likely to be common and also if we want one assign filling
conditional optimized out, we probably want to handle case with multiple
assignments.  What about
 1) walk function body and see if there are conditionals we know will be
    optimized out (at the begining those can be only those which has one
    arm reaching __bulitin_unreachable
 2) if there are none, just proceed with fnsummary construction
 3) if there were some, do non-cd-dce mark stage which will skip those
    dead conditional identified in 1
    and proceed to fnsummary construction with additional bitmap of
    marked stmts.

This should be cheaper than unconditionally doing cd-dce and should
handle common cases?
Honza

Richard Biener June 23, 2023, 11:11 a.m. UTC | #6

On Fri, Jun 23, 2023 at 12:11 PM Jan Hubicka <hubicka@ucw.cz> wrote:
>
> > On Mon, Jun 19, 2023 at 12:15 PM Jan Hubicka <hubicka@ucw.cz> wrote:
> > >
> > > > On Mon, Jun 19, 2023 at 9:52 AM Jan Hubicka via Gcc-patches
> > > > <gcc-patches@gcc.gnu.org> wrote:
> > > > >
> > > > > Hi,
> > > > > this was suggested earlier somewhere, but I can not find the thread.
> > > > > C++ has assume attribute that expands int
> > > > >   if (conditional)
> > > > >     __builtin_unreachable ()
> > > > > We do not want to account the conditional in inline heuristics since
> > > > > we know that it is going to be optimized out.
> > > > >
> > > > > Bootstrapped/regtested x86_64-linux, will commit it later today if
> > > > > thre are no complains.
> > > >
> > > > I think we also had the request to not account the condition feeding
> > > > stmts (if they only feed it and have no side-effects).  libstdc++ has
> > > > complex range comparisons here.  Also ...
> > >
> > > I was thinking of this: it depends on how smart do we want to get.
> > > We also have dead conditionals guarding clobbers, predicts and other
> > > stuff.  In general we can use mark phase of cd-dce telling it to ignore
> > > those statements and then use its resut in the analysis.
> >
> > Hmm, possible but a bit heavy-handed.  There's simple_dce_from_worklist
> > which might be a way to do this (of course we cannot use that 1:1).  Also
> > then consider
> >
> >  a = a + 1;
> >  if (a > 10)
> >    __builtin_unreachable ();
> >  if (a < 5)
> >    __builtin_unreachable ();
> >
> > and a has more than one use but both are going away.  So indeed a
> > more global analysis would be needed to get the full benefit.
>
> I was looking into simple_dce_from_worklist and if I understand it
> right, it simply walks list of SSA names which probably lost some uses
> by the consuming pass. If they have zero non-debug uses and defining statement has
> no side effects, then they are removed.
>
> I think this is not really fitting the bill here since the example above
> is likely to be common and also if we want one assign filling
> conditional optimized out, we probably want to handle case with multiple
> assignments.  What about
>  1) walk function body and see if there are conditionals we know will be
>     optimized out (at the begining those can be only those which has one
>     arm reaching __bulitin_unreachable
>  2) if there are none, just proceed with fnsummary construction
>  3) if there were some, do non-cd-dce mark stage which will skip those
>     dead conditional identified in 1
>     and proceed to fnsummary construction with additional bitmap of
>     marked stmts.

So you need to feed it with extra info on the optimized out stmts because
as-is it will not remove __builtin_unreachable ().  That means you're
doing the find_obviously_necessary_stmts manually, skipping the
conditional and all stmts it controls to the __builtin_unreachable () path?

I also think you want something cheaper than non-cd-dce mark, you also don't
want to bother with stores/loads?

Also when you only do this conditional how do you plan to use the result?

Richard.

>
> This should be cheaper than unconditionally doing cd-dce and should
> handle common cases?
> Honza

Jan Hubicka June 23, 2023, 1:19 p.m. UTC | #7

> 
> So you need to feed it with extra info on the optimized out stmts because
> as-is it will not remove __builtin_unreachable ().  That means you're

My plan was to add entry point to tree-ssa-dce that will take an
set of stmts declared dead by external force and will do the usual mark
stage bypassing mark_stmt_if_necessary if the stmt is in the set of
deads.

> doing the find_obviously_necessary_stmts manually, skipping the
> conditional and all stmts it controls to the __builtin_unreachable () path?
> 
> I also think you want something cheaper than non-cd-dce mark, you also don't
> want to bother with stores/loads?

You are probably right. cd-dce marking became bit of a monster and I do
not want to care about memory.
One can add extra flag to avoid processing of memory, but the code I
would re-use is quite small.

I can do my own mark&sweep  just considering phis, pre-identified
conditionals and basic gimple_assigns with no side effects as possibly
unnecesary stmts.  I can completely ignore debug stmts.

So it should be one pass through the statments to populate the worklist
& simple walk of the ssa graph to propagae it.

> 
> Also when you only do this conditional how do you plan to use the result?

Well, the analysis is a loop that walks all basic blocks and then all
stmts.  I can keep track if computation of live stmts was done and in
that case query the flag assume it is true otherwise.

Honza

Do not account __builtin_unreachable guards in inliner

Commit Message

Comments

Patch