diff mbox series

preserve more debug stmts in gimple jump threading

Message ID orlfz82koi.fsf@lxoliva.fsfla.org
State New
Headers show
Series preserve more debug stmts in gimple jump threading | expand

Commit Message

Alexandre Oliva May 15, 2019, 8:20 a.m. UTC
Gimple jump threading does not duplicate forwarder blocks that might
be present before or after the second copied block.  This silently
drops debug binds and markers that might be present in them.  This
patch attempts to preserve them.

For blocks after either copied block, we attempt to append debug stmts
to the copied block, if it does not end with a block-ending stmt.
Failing that, for blocks between both copied blocks, we prepend its
debug stmts to the copy of the second block.

If everything fails, we still drop debug stmts on the floor, though
preexisting code consolidates debug binds in the block that threading
flows into, so only markers are really lost.  We can't do much better
than that without conditional binds and markers, or debug stmts in
edges, or somesuch.

If we append debug stmts to a reusable template block, we copy it
after splitting out the debug stmts, and before putting them back.

Regstrapped on x86_64-linux-gnu and i686-linux-gnu.  Ok to install?


for  gcc/ChangeLog

	* tree-ssa-threadupdate.c (struct ssa_local_info_t): Add
	field template_last_to_copy.
	(ssa_create_duplicates): Set it, and use it.  Attempt to
	preserve more debug stmts.
---
 gcc/tree-ssa-threadupdate.c |   68 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 68 insertions(+)

Comments

Richard Biener May 15, 2019, 11:13 a.m. UTC | #1
On Wed, May 15, 2019 at 10:20 AM Alexandre Oliva <aoliva@redhat.com> wrote:
>
> Gimple jump threading does not duplicate forwarder blocks that might
> be present before or after the second copied block.

Empty forwarder blocks I presume?

>  This silently
> drops debug binds and markers that might be present in them.  This
> patch attempts to preserve them.
>
> For blocks after either copied block, we attempt to append debug stmts
> to the copied block, if it does not end with a block-ending stmt.
> Failing that, for blocks between both copied blocks, we prepend its
> debug stmts to the copy of the second block.
>
> If everything fails, we still drop debug stmts on the floor, though
> preexisting code consolidates debug binds in the block that threading
> flows into, so only markers are really lost.  We can't do much better
> than that without conditional binds and markers, or debug stmts in
> edges, or somesuch.
>
> If we append debug stmts to a reusable template block, we copy it
> after splitting out the debug stmts, and before putting them back.
>
> Regstrapped on x86_64-linux-gnu and i686-linux-gnu.  Ok to install?

I wonder if not copying empty forwarders is premature optimization
in that (hopefully) CFG cleanup is run after jump threading anyways?

If we'd copy the blocks the patch would be moot?

Richard.

>
> for  gcc/ChangeLog
>
>         * tree-ssa-threadupdate.c (struct ssa_local_info_t): Add
>         field template_last_to_copy.
>         (ssa_create_duplicates): Set it, and use it.  Attempt to
>         preserve more debug stmts.
> ---
>  gcc/tree-ssa-threadupdate.c |   68 +++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 68 insertions(+)
>
> diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
> index bccef879db03..d63154317746 100644
> --- a/gcc/tree-ssa-threadupdate.c
> +++ b/gcc/tree-ssa-threadupdate.c
> @@ -235,6 +235,12 @@ struct ssa_local_info_t
>       and sharing a template for that block is considerably more difficult.  */
>    basic_block template_block;
>
> +  /* If we append debug stmts to the template block after creating it,
> +     this iterator won't be the last one in the block, and further
> +     copies of the template block shouldn't get debug stmts after
> +     it.  */
> +  gimple_stmt_iterator template_last_to_copy;
> +
>    /* Blocks duplicated for the thread.  */
>    bitmap duplicate_blocks;
>
> @@ -1124,6 +1130,8 @@ ssa_create_duplicates (struct redirection_data **slot,
>        create_block_for_threading ((*path)[1]->e->src, rd, 0,
>                                   &local_info->duplicate_blocks);
>        local_info->template_block = rd->dup_blocks[0];
> +      local_info->template_last_to_copy
> +       = gsi_last_bb (local_info->template_block);
>
>        /* We do not create any outgoing edges for the template.  We will
>          take care of that in a later traversal.  That way we do not
> @@ -1131,14 +1139,74 @@ ssa_create_duplicates (struct redirection_data **slot,
>      }
>    else
>      {
> +      gimple_seq seq = NULL;
> +      if (gsi_stmt (local_info->template_last_to_copy)
> +         != gsi_stmt (gsi_last_bb (local_info->template_block)))
> +       seq = gsi_split_seq_after (local_info->template_last_to_copy);
>        create_block_for_threading (local_info->template_block, rd, 0,
>                                   &local_info->duplicate_blocks);
> +      if (seq)
> +       gsi_insert_seq_after (&local_info->template_last_to_copy,
> +                             seq, GSI_SAME_STMT);
>
>        /* Go ahead and wire up outgoing edges and update PHIs for the duplicate
>          block.   */
>        ssa_fix_duplicate_block_edges (rd, local_info);
>      }
>
> +  if (MAY_HAVE_DEBUG_STMTS)
> +    {
> +      /* Copy debug stmts from each NO_COPY src block to the block
> +        that would have been its predecessor, if we can append to it
> +        (we can't add stmts after a block-ending stmt), or prepending
> +        to the duplicate of the successor, if there is one.  If
> +        there's no duplicate successor, we'll mostly drop the blocks
> +        on the floor; propagate_threaded_block_debug_into, called
> +        elsewhere, will consolidate and preserve the effects of the
> +        binds, but none of the markers.  */
> +      gimple_stmt_iterator copy_to = gsi_last_bb (rd->dup_blocks[0]);
> +      if (!gsi_end_p (copy_to))
> +       {
> +         if (stmt_ends_bb_p (gsi_stmt (copy_to)))
> +           {
> +             if (rd->dup_blocks[1])
> +               copy_to = gsi_after_labels (rd->dup_blocks[1]);
> +             else
> +               copy_to = gsi_none ();
> +           }
> +         else
> +           gsi_next (&copy_to);
> +       }
> +      for (unsigned int i = 2, j = 0; i < path->length (); i++)
> +       if ((*path)[i]->type == EDGE_NO_COPY_SRC_BLOCK
> +           && gsi_bb (copy_to))
> +         {
> +           for (gimple_stmt_iterator gsi = gsi_start_bb ((*path)[i]->e->src);
> +                !gsi_end_p (gsi); gsi_next (&gsi))
> +             {
> +               if (!is_gimple_debug (gsi_stmt (gsi)))
> +                 continue;
> +               gimple *stmt = gsi_stmt (gsi);
> +               gimple *copy = gimple_copy (stmt);
> +               gsi_insert_before (&copy_to, copy, GSI_SAME_STMT);
> +             }
> +         }
> +       else if ((*path)[i]->type == EDGE_COPY_SRC_BLOCK
> +                || (*path)[i]->type == EDGE_COPY_SRC_JOINER_BLOCK)
> +         {
> +           j++;
> +           gcc_assert (j < 2);
> +           copy_to = gsi_last_bb (rd->dup_blocks[j]);
> +           if (!gsi_end_p (copy_to))
> +             {
> +               if (stmt_ends_bb_p (gsi_stmt (copy_to)))
> +                 copy_to = gsi_none ();
> +               else
> +                 gsi_next (&copy_to);
> +             }
> +         }
> +    }
> +
>    /* Keep walking the hash table.  */
>    return 1;
>  }
>
>
> --
> Alexandre Oliva, freedom fighter  he/him   https://FSFLA.org/blogs/lxo
> Be the change, be Free!                 FSF Latin America board member
> GNU Toolchain Engineer                        Free Software Evangelist
> Hay que enGNUrecerse, pero sin perder la terGNUra jamás - Che GNUevara
Alexandre Oliva May 15, 2019, 9:03 p.m. UTC | #2
On May 15, 2019, Richard Biener <richard.guenther@gmail.com> wrote:

> On Wed, May 15, 2019 at 10:20 AM Alexandre Oliva <aoliva@redhat.com> wrote:
>> 
>> Gimple jump threading does not duplicate forwarder blocks that might
>> be present before or after the copied block.

> Empty forwarder blocks I presume?

Empty except for debug stmts and possibly a final conditional jump that,
in the threading path, becomes unconditional.

> I wonder if not copying empty forwarders is premature optimization
> in that (hopefully) CFG cleanup is run after jump threading anyways?

I don't think it's so much of an optimization as it is an extension.
The code as it is can only really handle copying the initial block and
another subsequent block.  Threading through other "empty" blocks
didsn't require as significant changes as copying them would have.

If we were to change that, then we'd be extending jump threading to
actually copy multiple blocks, empty or not.  Not necessarily a bad
thing, but there are comments in the code suggesting that hasn't really
been tried.

> If we'd copy the blocks the patch would be moot?

Yes, but with a caveat.

It's true that cfgcleanup would pretty much end up having the same or
similar effect, consolidating debug stmts from empty blocks into
successor or predecessor if possible and dropping them otherwise.

What it would not do is to consolidate binds from the now-split paths
into the confluence, like propagate_threaded_block_debug_into does.  The
proposed patch does not change that, and it shouldn't: the confluence
binds stand on their own (*), very much like the binds we add after PHI
nodes when entering SSA.

That reduces the loss when we don't have a place to hold the debug
stmts, and it helps bind resolution in var-tracking even when the
incoming paths seem to have diverged in where they hold a variable, at
least when the binds don't end up being reset for this very reason even
when var-tracking dataflow could have figured out a common location
among the incoming blocks.

(*) though memory references in them might become wrong if they're
before the second block and the second block changes the memory
locations, it occurs to me now, without actually going back and checking
whether the consolidation pays any attention to that.

We shouldn't assume that just because we're now copying all intermediate
blocks, debug info and all, the confluence binds cease to be useful.
Rather, we should probably consider introducing them in other passes
that split paths and introduce new confluences.
Richard Biener May 16, 2019, 11:03 a.m. UTC | #3
On Wed, May 15, 2019 at 11:03 PM Alexandre Oliva <aoliva@redhat.com> wrote:
>
> On May 15, 2019, Richard Biener <richard.guenther@gmail.com> wrote:
>
> > On Wed, May 15, 2019 at 10:20 AM Alexandre Oliva <aoliva@redhat.com> wrote:
> >>
> >> Gimple jump threading does not duplicate forwarder blocks that might
> >> be present before or after the copied block.
>
> > Empty forwarder blocks I presume?
>
> Empty except for debug stmts and possibly a final conditional jump that,
> in the threading path, becomes unconditional.
>
> > I wonder if not copying empty forwarders is premature optimization
> > in that (hopefully) CFG cleanup is run after jump threading anyways?
>
> I don't think it's so much of an optimization as it is an extension.
> The code as it is can only really handle copying the initial block and
> another subsequent block.  Threading through other "empty" blocks
> didsn't require as significant changes as copying them would have.
>
> If we were to change that, then we'd be extending jump threading to
> actually copy multiple blocks, empty or not.  Not necessarily a bad
> thing, but there are comments in the code suggesting that hasn't really
> been tried.

Oh, I see - I thought we've extended the code to copy an arbitrary length
path by now ...

> > If we'd copy the blocks the patch would be moot?
>
> Yes, but with a caveat.
>
> It's true that cfgcleanup would pretty much end up having the same or
> similar effect, consolidating debug stmts from empty blocks into
> successor or predecessor if possible and dropping them otherwise.
>
> What it would not do is to consolidate binds from the now-split paths
> into the confluence, like propagate_threaded_block_debug_into does.  The
> proposed patch does not change that, and it shouldn't: the confluence
> binds stand on their own (*), very much like the binds we add after PHI
> nodes when entering SSA.

Ah, I missed that, so yes, it looks more powerful.

> That reduces the loss when we don't have a place to hold the debug
> stmts, and it helps bind resolution in var-tracking even when the
> incoming paths seem to have diverged in where they hold a variable, at
> least when the binds don't end up being reset for this very reason even
> when var-tracking dataflow could have figured out a common location
> among the incoming blocks.
>
> (*) though memory references in them might become wrong if they're
> before the second block and the second block changes the memory
> locations, it occurs to me now, without actually going back and checking
> whether the consolidation pays any attention to that.
>
> We shouldn't assume that just because we're now copying all intermediate
> blocks, debug info and all, the confluence binds cease to be useful.
> Rather, we should probably consider introducing them in other passes
> that split paths and introduce new confluences.

Indeed.

Richard.

> --
> Alexandre Oliva, freedom fighter  he/him   https://FSFLA.org/blogs/lxo
> Be the change, be Free!                 FSF Latin America board member
> GNU Toolchain Engineer                        Free Software Evangelist
> Hay que enGNUrecerse, pero sin perder la terGNUra jamás - Che GNUevara
Jeff Law May 16, 2019, 4:14 p.m. UTC | #4
On 5/15/19 3:03 PM, Alexandre Oliva wrote:
> On May 15, 2019, Richard Biener <richard.guenther@gmail.com> wrote:
> 
>> On Wed, May 15, 2019 at 10:20 AM Alexandre Oliva <aoliva@redhat.com> wrote:
>>>
>>> Gimple jump threading does not duplicate forwarder blocks that might
>>> be present before or after the copied block.
> 
>> Empty forwarder blocks I presume?
> 
> Empty except for debug stmts and possibly a final conditional jump that,
> in the threading path, becomes unconditional.
Right.  The tree-ssa-threadupate code all pre-dates the SEME copier
which is a *much* better way to handle duplicating the region.

Initially we allowed only one block with side effects in the jump
threading path.  That's all we really knew how to do correctly.  We
extended that to ignore forwarders at some point since they didn't need
to be copied -- you just need to know where the forwarder block will go.

We later added the ability to copy a second block with side effects in
the jump threading path.

But I'd really like to just remove tree-ssa-threadupate.c.  It's
horribly convoluted due to old requirements.  I'm confident we could use
the SEME copier to handle all the existing cases in a much simpler fashion.



Jeff
Richard Biener May 16, 2019, 6:46 p.m. UTC | #5
On Thu, May 16, 2019 at 6:14 PM Jeff Law <law@redhat.com> wrote:
>
> On 5/15/19 3:03 PM, Alexandre Oliva wrote:
> > On May 15, 2019, Richard Biener <richard.guenther@gmail.com> wrote:
> >
> >> On Wed, May 15, 2019 at 10:20 AM Alexandre Oliva <aoliva@redhat.com> wrote:
> >>>
> >>> Gimple jump threading does not duplicate forwarder blocks that might
> >>> be present before or after the copied block.
> >
> >> Empty forwarder blocks I presume?
> >
> > Empty except for debug stmts and possibly a final conditional jump that,
> > in the threading path, becomes unconditional.
> Right.  The tree-ssa-threadupate code all pre-dates the SEME copier
> which is a *much* better way to handle duplicating the region.
>
> Initially we allowed only one block with side effects in the jump
> threading path.  That's all we really knew how to do correctly.  We
> extended that to ignore forwarders at some point since they didn't need
> to be copied -- you just need to know where the forwarder block will go.
>
> We later added the ability to copy a second block with side effects in
> the jump threading path.
>
> But I'd really like to just remove tree-ssa-threadupate.c.  It's
> horribly convoluted due to old requirements.  I'm confident we could use
> the SEME copier to handle all the existing cases in a much simpler fashion.

Not sure if that's the best infrastructure to use (it cannot copy a path
crossing a backedge).  tracer does the duplicating incrementally
for example.  Technically the duplication isn't difficult but some
simplification on the fly would be nice (like actually merging the
blocks and propagating out PHIs and constants).

Eventually I'll find cycles to implement sth like this in a greedy
fashion for loop unrolling.

Richard.

>
>
> Jeff
Jeff Law May 16, 2019, 6:52 p.m. UTC | #6
On 5/16/19 12:46 PM, Richard Biener wrote:
> On Thu, May 16, 2019 at 6:14 PM Jeff Law <law@redhat.com> wrote:
>>
>> On 5/15/19 3:03 PM, Alexandre Oliva wrote:
>>> On May 15, 2019, Richard Biener <richard.guenther@gmail.com> wrote:
>>>
>>>> On Wed, May 15, 2019 at 10:20 AM Alexandre Oliva <aoliva@redhat.com> wrote:
>>>>>
>>>>> Gimple jump threading does not duplicate forwarder blocks that might
>>>>> be present before or after the copied block.
>>>
>>>> Empty forwarder blocks I presume?
>>>
>>> Empty except for debug stmts and possibly a final conditional jump that,
>>> in the threading path, becomes unconditional.
>> Right.  The tree-ssa-threadupate code all pre-dates the SEME copier
>> which is a *much* better way to handle duplicating the region.
>>
>> Initially we allowed only one block with side effects in the jump
>> threading path.  That's all we really knew how to do correctly.  We
>> extended that to ignore forwarders at some point since they didn't need
>> to be copied -- you just need to know where the forwarder block will go.
>>
>> We later added the ability to copy a second block with side effects in
>> the jump threading path.
>>
>> But I'd really like to just remove tree-ssa-threadupate.c.  It's
>> horribly convoluted due to old requirements.  I'm confident we could use
>> the SEME copier to handle all the existing cases in a much simpler fashion.
> 
> Not sure if that's the best infrastructure to use (it cannot copy a path
> crossing a backedge).  tracer does the duplicating incrementally
> for example.  Technically the duplication isn't difficult but some
> simplification on the fly would be nice (like actually merging the
> blocks and propagating out PHIs and constants).
We don't need to worry about backedges in this code anymore.

And yes, some simplification would be helpful.  In fact using the SEME
copier actually helps with that because it requires a bit more structure.

THe biggest downside I see with moving to the SEME copier here would be
that when we have multiple incoming edges that thread to the same
outgoing edge, the current copier will create a single duplicate.  We'd
likely end up with multiple duplicates using the SEME copier.

Jeff
Jeff Law May 17, 2019, 3:36 p.m. UTC | #7
On 5/15/19 2:20 AM, Alexandre Oliva wrote:
> Gimple jump threading does not duplicate forwarder blocks that might
> be present before or after the second copied block.  This silently
> drops debug binds and markers that might be present in them.  This
> patch attempts to preserve them.
> 
> For blocks after either copied block, we attempt to append debug stmts
> to the copied block, if it does not end with a block-ending stmt.
> Failing that, for blocks between both copied blocks, we prepend its
> debug stmts to the copy of the second block.
> 
> If everything fails, we still drop debug stmts on the floor, though
> preexisting code consolidates debug binds in the block that threading
> flows into, so only markers are really lost.  We can't do much better
> than that without conditional binds and markers, or debug stmts in
> edges, or somesuch.
> 
> If we append debug stmts to a reusable template block, we copy it
> after splitting out the debug stmts, and before putting them back.
> 
> Regstrapped on x86_64-linux-gnu and i686-linux-gnu.  Ok to install?
> 
> 
> for  gcc/ChangeLog
> 
> 	* tree-ssa-threadupdate.c (struct ssa_local_info_t): Add
> 	field template_last_to_copy.
> 	(ssa_create_duplicates): Set it, and use it.  Attempt to
> 	preserve more debug stmts.
OK.  Presumably creating a reliable testcase was painful?

jeff
> ---
Alexandre Oliva May 21, 2019, 5:15 p.m. UTC | #8
On May 17, 2019, Jeff Law <law@redhat.com> wrote:

> OK.  Presumably creating a reliable testcase was painful?

Heh, that it might even possible didn't even cross my mind.  I was happy
enough to be able to exercise and inspect the before&after IR for most
of the new code paths in the patch in a GCC bootstrap, by placing
gcc_assert(gcc_stop_here_*||getenv("GCC_STOP_HERE_*")) at various
places, so that I could stop to inspect, skip them in a debugger or in a
full run to look at the compiler dumps.  That alone took much longer
than I had to complete it :-(
diff mbox series

Patch

diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index bccef879db03..d63154317746 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -235,6 +235,12 @@  struct ssa_local_info_t
      and sharing a template for that block is considerably more difficult.  */
   basic_block template_block;
 
+  /* If we append debug stmts to the template block after creating it,
+     this iterator won't be the last one in the block, and further
+     copies of the template block shouldn't get debug stmts after
+     it.  */
+  gimple_stmt_iterator template_last_to_copy;
+
   /* Blocks duplicated for the thread.  */
   bitmap duplicate_blocks;
 
@@ -1124,6 +1130,8 @@  ssa_create_duplicates (struct redirection_data **slot,
       create_block_for_threading ((*path)[1]->e->src, rd, 0,
 				  &local_info->duplicate_blocks);
       local_info->template_block = rd->dup_blocks[0];
+      local_info->template_last_to_copy
+	= gsi_last_bb (local_info->template_block);
 
       /* We do not create any outgoing edges for the template.  We will
 	 take care of that in a later traversal.  That way we do not
@@ -1131,14 +1139,74 @@  ssa_create_duplicates (struct redirection_data **slot,
     }
   else
     {
+      gimple_seq seq = NULL;
+      if (gsi_stmt (local_info->template_last_to_copy)
+	  != gsi_stmt (gsi_last_bb (local_info->template_block)))
+	seq = gsi_split_seq_after (local_info->template_last_to_copy);
       create_block_for_threading (local_info->template_block, rd, 0,
 				  &local_info->duplicate_blocks);
+      if (seq)
+	gsi_insert_seq_after (&local_info->template_last_to_copy,
+			      seq, GSI_SAME_STMT);
 
       /* Go ahead and wire up outgoing edges and update PHIs for the duplicate
 	 block.   */
       ssa_fix_duplicate_block_edges (rd, local_info);
     }
 
+  if (MAY_HAVE_DEBUG_STMTS)
+    {
+      /* Copy debug stmts from each NO_COPY src block to the block
+	 that would have been its predecessor, if we can append to it
+	 (we can't add stmts after a block-ending stmt), or prepending
+	 to the duplicate of the successor, if there is one.  If
+	 there's no duplicate successor, we'll mostly drop the blocks
+	 on the floor; propagate_threaded_block_debug_into, called
+	 elsewhere, will consolidate and preserve the effects of the
+	 binds, but none of the markers.  */
+      gimple_stmt_iterator copy_to = gsi_last_bb (rd->dup_blocks[0]);
+      if (!gsi_end_p (copy_to))
+	{
+	  if (stmt_ends_bb_p (gsi_stmt (copy_to)))
+	    {
+	      if (rd->dup_blocks[1])
+		copy_to = gsi_after_labels (rd->dup_blocks[1]);
+	      else
+		copy_to = gsi_none ();
+	    }
+	  else
+	    gsi_next (&copy_to);
+	}
+      for (unsigned int i = 2, j = 0; i < path->length (); i++)
+	if ((*path)[i]->type == EDGE_NO_COPY_SRC_BLOCK
+	    && gsi_bb (copy_to))
+	  {
+	    for (gimple_stmt_iterator gsi = gsi_start_bb ((*path)[i]->e->src);
+		 !gsi_end_p (gsi); gsi_next (&gsi))
+	      {
+		if (!is_gimple_debug (gsi_stmt (gsi)))
+		  continue;
+		gimple *stmt = gsi_stmt (gsi);
+		gimple *copy = gimple_copy (stmt);
+		gsi_insert_before (&copy_to, copy, GSI_SAME_STMT);
+	      }
+	  }
+	else if ((*path)[i]->type == EDGE_COPY_SRC_BLOCK
+		 || (*path)[i]->type == EDGE_COPY_SRC_JOINER_BLOCK)
+	  {
+	    j++;
+	    gcc_assert (j < 2);
+	    copy_to = gsi_last_bb (rd->dup_blocks[j]);
+	    if (!gsi_end_p (copy_to))
+	      {
+		if (stmt_ends_bb_p (gsi_stmt (copy_to)))
+		  copy_to = gsi_none ();
+		else
+		  gsi_next (&copy_to);
+	      }
+	  }
+    }
+
   /* Keep walking the hash table.  */
   return 1;
 }