diff mbox

Fix pr60363 by adding backtraced value of phi arg along jump threading path

Message ID 006901cf4292$a78407e0$f68c17a0$@arm.com
State New
Headers show

Commit Message

Bin Cheng March 18, 2014, 10:13 a.m. UTC
Hi,
After control flow graph change made by
http://gcc.gnu.org/ml/gcc-patches/2014-02/msg01492.html, case
gcc.dg/tree-ssa/ssa-dom-thread-4.c is broken on logical_op_short_circuit
targets including cortex-m3/cortex-m0.
The regression reveals a missed opportunity in jump threading, which causes
a forward basic block doesn't get removed in cfgcleanup after jump threading
in VRP1.  Root cause is stated at the corresponding PR:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60363, please refer to it for
detailed report.

This patch fixes the issue by adding constant value instead of ssa_name as
the new phi argument.  Bootstrap and test on x86_64, also test on cortex-m3
and the regression is gone.
I think this should wait for stage1, but would like to hear some comments
now.  So does it look reasonable?


2014-03-18  Bin Cheng  <bin.cheng@arm.com>

	PR regression/60363
	* gcc/tree-ssa-threadupdate.c (get_value_locus_in_path): New.
	(copy_phi_args): New parameters.  Call get_value_locus_in_path.
	(update_destination_phis): New parameter.
	(create_edge_and_update_destination_phis): Ditto.
	(ssa_fix_duplicate_block_edges): Pass new arguments.
	(thread_single_edge): Ditto.

Comments

Jeff Law April 17, 2014, 5:30 a.m. UTC | #1
On 03/18/14 04:13, bin.cheng wrote:
> Hi,
> After control flow graph change made by
> http://gcc.gnu.org/ml/gcc-patches/2014-02/msg01492.html, case
> gcc.dg/tree-ssa/ssa-dom-thread-4.c is broken on logical_op_short_circuit
> targets including cortex-m3/cortex-m0.
> The regression reveals a missed opportunity in jump threading, which causes
> a forward basic block doesn't get removed in cfgcleanup after jump threading
> in VRP1.  Root cause is stated at the corresponding PR:
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60363, please refer to it for
> detailed report.
>
> This patch fixes the issue by adding constant value instead of ssa_name as
> the new phi argument.  Bootstrap and test on x86_64, also test on cortex-m3
> and the regression is gone.
> I think this should wait for stage1, but would like to hear some comments
> now.  So does it look reasonable?
>
>
> 2014-03-18  Bin Cheng<bin.cheng@arm.com>
>
> 	PR regression/60363
> 	* gcc/tree-ssa-threadupdate.c (get_value_locus_in_path): New.
> 	(copy_phi_args): New parameters.  Call get_value_locus_in_path.
> 	(update_destination_phis): New parameter.
> 	(create_edge_and_update_destination_phis): Ditto.
> 	(ssa_fix_duplicate_block_edges): Pass new arguments.
> 	(thread_single_edge): Ditto.
This is a good and interesting catch. DOM knows how to propagate these 
context sensitive equivalences which should expose the optimizable 
forwarder blocks.

But I'm a big believer in catching as many CFG simplifications as early 
as we can as they tend to have nice cascading effects.  So if we can 
pick it up by being smarter in how we duplicate arguments, then I'm all 
for it.

> +  for (int j = idx - 1; j >= 0; j--)
> +    {
> +      edge e = (*path)[j]->e;
> +      if (e->dest == def_bb)
> +	{
> +	  arg = gimple_phi_arg_def (def_phi, e->dest_idx);
> +	  *locus = gimple_phi_arg_location (def_phi, e->dest_idx);
> +	  return (TREE_CODE (arg) == INTEGER_CST ? arg : def);
Presumably any constant that can legitimately appear in a PHI node is 
good here.  So for example ADDR_EXPR <something in static storage> ought 
to be handled as well.

One could also argue that we should go ahead and do a context sensitive 
copy propagation here too if ARG turns out to be an SSA_NAME.  You have 
to be a bit more careful with those and use may_propagate_copy_p and 
you'd probably want to test the loop depth of the SSA_NAMEs to ensure 
you're not doing a propagation that is going to muck up LICM.  See 
loop_depth_of_name uses in tree-ssa-dom.c.

Overall I think it's good.  We just need to resolve whether or not we 
want to catch constant ADDR_EXPRs and/or do the context sensitive copy 
propagations.

jeff
Richard Biener April 17, 2014, 8 a.m. UTC | #2
On Thu, Apr 17, 2014 at 7:30 AM, Jeff Law <law@redhat.com> wrote:
> On 03/18/14 04:13, bin.cheng wrote:
>>
>> Hi,
>> After control flow graph change made by
>> http://gcc.gnu.org/ml/gcc-patches/2014-02/msg01492.html, case
>> gcc.dg/tree-ssa/ssa-dom-thread-4.c is broken on logical_op_short_circuit
>> targets including cortex-m3/cortex-m0.
>> The regression reveals a missed opportunity in jump threading, which
>> causes
>> a forward basic block doesn't get removed in cfgcleanup after jump
>> threading
>> in VRP1.  Root cause is stated at the corresponding PR:
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60363, please refer to it for
>> detailed report.
>>
>> This patch fixes the issue by adding constant value instead of ssa_name as
>> the new phi argument.  Bootstrap and test on x86_64, also test on
>> cortex-m3
>> and the regression is gone.
>> I think this should wait for stage1, but would like to hear some comments
>> now.  So does it look reasonable?
>>
>>
>> 2014-03-18  Bin Cheng<bin.cheng@arm.com>
>>
>>         PR regression/60363
>>         * gcc/tree-ssa-threadupdate.c (get_value_locus_in_path): New.
>>         (copy_phi_args): New parameters.  Call get_value_locus_in_path.
>>         (update_destination_phis): New parameter.
>>         (create_edge_and_update_destination_phis): Ditto.
>>         (ssa_fix_duplicate_block_edges): Pass new arguments.
>>         (thread_single_edge): Ditto.
>
> This is a good and interesting catch. DOM knows how to propagate these
> context sensitive equivalences which should expose the optimizable forwarder
> blocks.
>
> But I'm a big believer in catching as many CFG simplifications as early as
> we can as they tend to have nice cascading effects.  So if we can pick it up
> by being smarter in how we duplicate arguments, then I'm all for it.
>
>> +  for (int j = idx - 1; j >= 0; j--)
>> +    {
>> +      edge e = (*path)[j]->e;
>> +      if (e->dest == def_bb)
>> +       {
>> +         arg = gimple_phi_arg_def (def_phi, e->dest_idx);
>> +         *locus = gimple_phi_arg_location (def_phi, e->dest_idx);
>> +         return (TREE_CODE (arg) == INTEGER_CST ? arg : def);
>
> Presumably any constant that can legitimately appear in a PHI node is good
> here.  So for example ADDR_EXPR <something in static storage> ought to be
> handled as well.
>
> One could also argue that we should go ahead and do a context sensitive copy
> propagation here too if ARG turns out to be an SSA_NAME.  You have to be a
> bit more careful with those and use may_propagate_copy_p and you'd probably
> want to test the loop depth of the SSA_NAMEs to ensure you're not doing a
> propagation that is going to muck up LICM.  See loop_depth_of_name uses in
> tree-ssa-dom.c.
>
> Overall I think it's good.  We just need to resolve whether or not we want
> to catch constant ADDR_EXPRs and/or do the context sensitive copy
> propagations.

Simply use is_gimple_min_invariant (arg) ? arg : def

> jeff
Bin.Cheng April 17, 2014, 10:07 a.m. UTC | #3
On Thu, Apr 17, 2014 at 1:30 PM, Jeff Law <law@redhat.com> wrote:
> On 03/18/14 04:13, bin.cheng wrote:
>>
>> Hi,
>> After control flow graph change made by
>> http://gcc.gnu.org/ml/gcc-patches/2014-02/msg01492.html, case
>> gcc.dg/tree-ssa/ssa-dom-thread-4.c is broken on logical_op_short_circuit
>> targets including cortex-m3/cortex-m0.
>> The regression reveals a missed opportunity in jump threading, which
>> causes
>> a forward basic block doesn't get removed in cfgcleanup after jump
>> threading
>> in VRP1.  Root cause is stated at the corresponding PR:
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60363, please refer to it for
>> detailed report.
>>
>> This patch fixes the issue by adding constant value instead of ssa_name as
>> the new phi argument.  Bootstrap and test on x86_64, also test on
>> cortex-m3
>> and the regression is gone.
>> I think this should wait for stage1, but would like to hear some comments
>> now.  So does it look reasonable?
>>
>>
>> 2014-03-18  Bin Cheng<bin.cheng@arm.com>
>>
>>         PR regression/60363
>>         * gcc/tree-ssa-threadupdate.c (get_value_locus_in_path): New.
>>         (copy_phi_args): New parameters.  Call get_value_locus_in_path.
>>         (update_destination_phis): New parameter.
>>         (create_edge_and_update_destination_phis): Ditto.
>>         (ssa_fix_duplicate_block_edges): Pass new arguments.
>>         (thread_single_edge): Ditto.
>
> This is a good and interesting catch. DOM knows how to propagate these
> context sensitive equivalences which should expose the optimizable forwarder
> blocks.
At the time I was looking into the problem, DOM couldn't understand
the equivalence.  Maybe it can be improved too.
>
> But I'm a big believer in catching as many CFG simplifications as early as
> we can as they tend to have nice cascading effects.  So if we can pick it up
> by being smarter in how we duplicate arguments, then I'm all for it.
>
>> +  for (int j = idx - 1; j >= 0; j--)
>> +    {
>> +      edge e = (*path)[j]->e;
>> +      if (e->dest == def_bb)
>> +       {
>> +         arg = gimple_phi_arg_def (def_phi, e->dest_idx);
>> +         *locus = gimple_phi_arg_location (def_phi, e->dest_idx);
>> +         return (TREE_CODE (arg) == INTEGER_CST ? arg : def);
>
> Presumably any constant that can legitimately appear in a PHI node is good
> here.  So for example ADDR_EXPR <something in static storage> ought to be
> handled as well.
>
> One could also argue that we should go ahead and do a context sensitive copy
> propagation here too if ARG turns out to be an SSA_NAME.  You have to be a
> bit more careful with those and use may_propagate_copy_p and you'd probably
> want to test the loop depth of the SSA_NAMEs to ensure you're not doing a
> propagation that is going to muck up LICM.  See loop_depth_of_name uses in
> tree-ssa-dom.c.
>
> Overall I think it's good.  We just need to resolve whether or not we want
> to catch constant ADDR_EXPRs and/or do the context sensitive copy
> propagations.
Do you mean const/copy propagation in jump threading optimization, or
just an independent opt somewhere else?  It's naturally flow sensitive
along jump threading path, which looks interesting to me.

Thanks,
bin
>
> jeff
Jeff Law April 24, 2014, 8:52 p.m. UTC | #4
On 04/17/14 04:07, Bin.Cheng wrote:
> On Thu, Apr 17, 2014 at 1:30 PM, Jeff Law <law@redhat.com> wrote:
>> On 03/18/14 04:13, bin.cheng wrote:
>>>
>>> Hi,
>>> After control flow graph change made by
>>> http://gcc.gnu.org/ml/gcc-patches/2014-02/msg01492.html, case
>>> gcc.dg/tree-ssa/ssa-dom-thread-4.c is broken on logical_op_short_circuit
>>> targets including cortex-m3/cortex-m0.
>>> The regression reveals a missed opportunity in jump threading, which
>>> causes
>>> a forward basic block doesn't get removed in cfgcleanup after jump
>>> threading
>>> in VRP1.  Root cause is stated at the corresponding PR:
>>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60363, please refer to it for
>>> detailed report.
>>>
>>> This patch fixes the issue by adding constant value instead of ssa_name as
>>> the new phi argument.  Bootstrap and test on x86_64, also test on
>>> cortex-m3
>>> and the regression is gone.
>>> I think this should wait for stage1, but would like to hear some comments
>>> now.  So does it look reasonable?
>>>
>>>
>>> 2014-03-18  Bin Cheng<bin.cheng@arm.com>
>>>
>>>          PR regression/60363
>>>          * gcc/tree-ssa-threadupdate.c (get_value_locus_in_path): New.
>>>          (copy_phi_args): New parameters.  Call get_value_locus_in_path.
>>>          (update_destination_phis): New parameter.
>>>          (create_edge_and_update_destination_phis): Ditto.
>>>          (ssa_fix_duplicate_block_edges): Pass new arguments.
>>>          (thread_single_edge): Ditto.
>>
>> This is a good and interesting catch. DOM knows how to propagate these
>> context sensitive equivalences which should expose the optimizable forwarder
>> blocks.
> At the time I was looking into the problem, DOM couldn't understand
> the equivalence.  Maybe it can be improved too.
That's something I improved during 4.9 stage1 development.  There were 
cases where we clearly missed propagating a context sensitive 
equivalence into PHI nodes.  I didn't check this specific case, but it 
looks reasonably similar to the cases I fixed a few months back.

>>> +  for (int j = idx - 1; j >= 0; j--)
>>> +    {
>>> +      edge e = (*path)[j]->e;
>>> +      if (e->dest == def_bb)
>>> +       {
>>> +         arg = gimple_phi_arg_def (def_phi, e->dest_idx);
>>> +         *locus = gimple_phi_arg_location (def_phi, e->dest_idx);
>>> +         return (TREE_CODE (arg) == INTEGER_CST ? arg : def);
>>
>> Presumably any constant that can legitimately appear in a PHI node is good
>> here.  So for example ADDR_EXPR <something in static storage> ought to be
>> handled as well.
>>
>> One could also argue that we should go ahead and do a context sensitive copy
>> propagation here too if ARG turns out to be an SSA_NAME.  You have to be a
>> bit more careful with those and use may_propagate_copy_p and you'd probably
>> want to test the loop depth of the SSA_NAMEs to ensure you're not doing a
>> propagation that is going to muck up LICM.  See loop_depth_of_name uses in
>> tree-ssa-dom.c.
>>
>> Overall I think it's good.  We just need to resolve whether or not we want
>> to catch constant ADDR_EXPRs and/or do the context sensitive copy
>> propagations.
> Do you mean const/copy propagation in jump threading optimization, or
> just an independent opt somewhere else?  It's naturally flow sensitive
> along jump threading path, which looks interesting to me.
Thinking more about it, I'd probably hold off on the context sensitive 
copy propagation right now since you have to check a variety of things 
to ensure correctness and performance.  Handling that as a follow-up 
would make sense if you're interested.

So for this patch to go forward, all I think you need to do is change 
the test TREE_CODE (arg) == INTEGER_CST above to instead test 
is_gimple_min_invariant (arg), bootstrap & test.  Assuming the bootstrap 
and regressino test pass with that change, consider the patch pre-approved.

jeff
diff mbox

Patch

Index: gcc/tree-ssa-threadupdate.c
===================================================================
--- gcc/tree-ssa-threadupdate.c	(revision 208609)
+++ gcc/tree-ssa-threadupdate.c	(working copy)
@@ -403,10 +403,51 @@  copy_phi_arg_into_existing_phi (edge src_e, edge t
     }
 }
 
-/* For each PHI in BB, copy the argument associated with SRC_E to TGT_E.  */
+/* Given ssa_name DEF, backtrack jump threading PATH from node IDX
+   to see if it has constant value in a flow sensitive manner.  Set
+   LOCUS to location of the constant phi arg and return the value.
+   Return DEF directly if either PATH or idx is ZERO.  */
 
+static tree
+get_value_locus_in_path (tree def, vec<jump_thread_edge *> *path,
+			 int idx, source_location *locus)
+{
+  tree arg;
+  gimple def_phi;
+  basic_block def_bb;
+
+  if (path == NULL || idx == 0)
+    return def;
+
+  def_phi = SSA_NAME_DEF_STMT (def);
+  if (gimple_code (def_phi) != GIMPLE_PHI)
+    return def;
+
+  def_bb = gimple_bb (def_phi);
+  /* Backtrack jump threading path from IDX to see if def has constant
+     value.  */
+  for (int j = idx - 1; j >= 0; j--)
+    {
+      edge e = (*path)[j]->e;
+      if (e->dest == def_bb)
+	{
+	  arg = gimple_phi_arg_def (def_phi, e->dest_idx);
+	  *locus = gimple_phi_arg_location (def_phi, e->dest_idx);
+	  return (TREE_CODE (arg) == INTEGER_CST ? arg : def);
+	}
+    }
+
+  return def;
+}
+
+/* For each PHI in BB, copy the argument associated with SRC_E to TGT_E.
+   Try to backtrack jump threading PATH from node IDX to see if the arg
+   has constant value, copy constant value instead of argument itself
+   if yes.  */
+
 static void
-copy_phi_args (basic_block bb, edge src_e, edge tgt_e)
+copy_phi_args (basic_block bb, edge src_e, edge tgt_e,
+	       vec<jump_thread_edge *> *path, int idx)
 {
   gimple_stmt_iterator gsi;
   int src_indx = src_e->dest_idx;
@@ -414,8 +455,14 @@  static void
   for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
     {
       gimple phi = gsi_stmt (gsi);
+      tree def = gimple_phi_arg_def (phi, src_indx);
       source_location locus = gimple_phi_arg_location (phi, src_indx);
-      add_phi_arg (phi, gimple_phi_arg_def (phi, src_indx), tgt_e, locus);
+
+      if (TREE_CODE (def) == SSA_NAME
+	  && !virtual_operand_p (gimple_phi_result (phi)))
+	def = get_value_locus_in_path (def, path, idx, &locus);
+
+      add_phi_arg (phi, def, tgt_e, locus);
     }
 }
 
@@ -423,10 +470,13 @@  static void
    edges.  The copy is NEW_BB.  Every PHI node in every direct successor of
    ORIG_BB has a new argument associated with edge from NEW_BB to the
    successor.  Initialize the PHI argument so that it is equal to the PHI
-   argument associated with the edge from ORIG_BB to the successor.  */
+   argument associated with the edge from ORIG_BB to the successor.
+   PATH and IDX are used to check if the new PHI argument has constant
+   value in a flow sensitive manner.  */
 
 static void
-update_destination_phis (basic_block orig_bb, basic_block new_bb)
+update_destination_phis (basic_block orig_bb, basic_block new_bb,
+			 vec<jump_thread_edge *> *path, int idx)
 {
   edge_iterator ei;
   edge e;
@@ -434,7 +484,7 @@  static void
   FOR_EACH_EDGE (e, ei, orig_bb->succs)
     {
       edge e2 = find_edge (new_bb, e->dest);
-      copy_phi_args (e->dest, e, e2);
+      copy_phi_args (e->dest, e, e2, path, idx);
     }
 }
 
@@ -443,11 +493,13 @@  static void
    destination.
 
    Add an additional argument to any PHI nodes at the single
-   destination.  */
+   destination.  IDX is the start node in jump threading path
+   we start to check to see if the new PHI argument has constant
+   value along the jump threading path.  */
 
 static void
 create_edge_and_update_destination_phis (struct redirection_data *rd,
-					 basic_block bb)
+					 basic_block bb, int idx)
 {
   edge e = make_edge (bb, rd->path->last ()->e->dest, EDGE_FALLTHRU);
 
@@ -476,7 +528,7 @@  create_edge_and_update_destination_phis (struct re
      from the duplicate block, then we will need to add a new argument
      to them.  The argument should have the same value as the argument
      associated with the outgoing edge stored in RD.  */
-  copy_phi_args (e->dest, rd->path->last ()->e, e);
+  copy_phi_args (e->dest, rd->path->last ()->e, e, rd->path, idx);
 }
 
 /* Look through PATH beginning at START and return TRUE if there are
@@ -501,6 +553,7 @@  void
 ssa_fix_duplicate_block_edges (struct redirection_data *rd,
 			       ssa_local_info_t *local_info)
 {
+  bool mult_incomings = (rd->incoming_edges->next != NULL);
   edge e = rd->incoming_edges->e;
   vec<jump_thread_edge *> *path = THREAD_PATH (e);
 
@@ -516,8 +569,10 @@  ssa_fix_duplicate_block_edges (struct redirection_
 	  edge e2;
 
 	  /* This updates the PHIs at the destination of the duplicate
-	     block.  */
-	  update_destination_phis (local_info->bb, rd->dup_blocks[count]);
+	     block.  Pass 0 instead of i if we are threading a path which
+	     has multiple incoming edges.  */
+	  update_destination_phis (local_info->bb, rd->dup_blocks[count],
+				   path, mult_incomings ? i : 0);
 
 	  /* Find the edge from the duplicate block to the block we're
 	     threading through.  That's the edge we want to redirect.  */
@@ -535,7 +590,8 @@  ssa_fix_duplicate_block_edges (struct redirection_
 		 case), then the PHIs in the target already have the correct
 		 arguments.  */
 	      if (e2 == victim)
-		copy_phi_args (e2->dest, path->last ()->e, e2);
+		copy_phi_args (e2->dest, path->last ()->e, e2,
+			       path, mult_incomings ? i : 0);
 	    }
 	  else
 	    {
@@ -567,7 +623,8 @@  ssa_fix_duplicate_block_edges (struct redirection_
       else if ((*path)[i]->type == EDGE_COPY_SRC_BLOCK)
 	{
 	  remove_ctrl_stmt_and_useless_edges (rd->dup_blocks[count], NULL);
-	  create_edge_and_update_destination_phis (rd, rd->dup_blocks[count]);
+	  create_edge_and_update_destination_phis (rd, rd->dup_blocks[count],
+						   mult_incomings ? i : 0);
 	  if (count == 1)
 	    single_succ_edge (rd->dup_blocks[1])->aux = NULL;
 	  count++;
@@ -989,7 +1046,7 @@  thread_single_edge (edge e)
 
   create_block_for_threading (bb, &rd, 0);
   remove_ctrl_stmt_and_useless_edges (rd.dup_blocks[0], NULL);
-  create_edge_and_update_destination_phis (&rd, rd.dup_blocks[0]);
+  create_edge_and_update_destination_phis (&rd, rd.dup_blocks[0], 0);
 
   if (dump_file && (dump_flags & TDF_DETAILS))
     fprintf (dump_file, "  Threaded jump %d --> %d to %d\n",