[PR,tree-optimization/71691] Fix unswitching in presence of maybe-undef SSA_NAMEs (take 2)
diff mbox

Message ID f64704f5-fafd-5eba-6c60-568870402703@redhat.com
State New
Headers show

Commit Message

Aldy Hernandez Jan. 30, 2017, 5:19 p.m. UTC
On 01/30/2017 10:03 AM, Richard Biener wrote:
> On Fri, Jan 27, 2017 at 12:20 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
>> On 01/26/2017 07:29 AM, Richard Biener wrote:
>>>
>>> On Thu, Jan 26, 2017 at 1:04 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
>>>>
>>>> On 01/24/2017 07:23 AM, Richard Biener wrote:
>>
>>
>>> Your initial on-demand approach is fine to catch some of the cases but it
>>> will not handle those for which we'd need post-dominance.
>>>
>>> I guess we can incrementally add that.
>>
>>
>> No complaints from me.
>>
>> This is my initial on-demand approach, with a few fixes you've commented on
>> throughout.
>>
>> As you can see, there is still an #if 0 wrt to using your suggested
>> conservative handling of memory loads, which I'm not entirely convinced of,
>> as it pessimizes gcc.dg/loop-unswitch-1.c.  If you feel strongly about it, I
>> can enable the code again.
>
> It is really required -- fortunately loop-unswitch-1.c is one of the cases where
> the post-dom / always-executed bits help .  The comparison is inside the
> loop header and thus always executed when the loop enters, so inserrting it
> on the preheader edge is fine.

Left as is then.

>
>> Also, I enhanced gcc.dg/loop-unswitch-1.c to verify that we're actually
>> unswitching something.  It seems kinda silly that we have various unswitch
>> tests, but we don't actually check whether we have unswitched anything.
>
> Heh.  It probably was added for an ICE...
>
>> This test was the only one in the *unswitch*.c set that I saw was actually
>> being unswitched.  Of course, if you don't agree with my #if 0 above, I will
>> remove this change to the test.
>>
>> Finally, are we even guaranteed to unswitch in loop-unswitch-1.c across
>> architectures?  If not, again, I can remove the one-liner.
>
> I think so.

Left as well.

>
>>
>> How does this look for trunk?
>
> With a unswitch-local solution I meant to not add new files but put the
> defined_or_undefined class (well, or rather a single function...) into
> tree-ssa-loop-unswitch.c.

Done.

>
> @@ -138,7 +141,7 @@ tree_may_unswitch_on (basic_block bb, struct loop *loop)
>      {
>        /* Unswitching on undefined values would introduce undefined
>          behavior that the original program might never exercise.  */
> -      if (ssa_undefined_value_p (use, true))
> +      if (defined_or_undefined->is_maybe_undefined (use))
>         return NULL_TREE;
>        def = SSA_NAME_DEF_STMT (use);
>        def_bb = gimple_bb (def);
>
> as I said, moving this (now more expensive check) after
>
>       if (def_bb
>           && flow_bb_inside_loop_p (loop, def_bb))
>         return NULL_TREE;
>
> this cheap check would be better.  It should avoid 99% of all calls I bet.

Done.

>
> You can recover the loop-unswitch-1.c case by passing down
> the using stmt and checking its BB against loop_header (the only
> block that we trivially know is always executed when entering the region).
> Or do that check in the caller, like
>
>         if (bb != loop->header
>            && ssa_undefined_value_p (use, true) /
> defined_or_undefined->is_maybe_undefined (use))

Done in callee.

>
> +      gimple *def = SSA_NAME_DEF_STMT (t);
> +
> +      /* Check that all the PHI args are fully defined.  */
> +      if (gphi *phi = dyn_cast <gphi *> (def))
> +       {
> +         if (virtual_operand_p (PHI_RESULT (phi)))
> +           continue;
>
> You should never run into virtual operands (you only walk SSA_OP_USEs).

Done.

>
> You can stop walking at stmts that dominate the region header,
> like with
>
> +      gimple *def = SSA_NAME_DEF_STMT (t);
>         /* Uses in stmts always executed when the region header
> executes are fine.  */
>         if (dominated_by_p (CDI_DOMINATORS, loop_header, gimple_bb (def)))
>           continue;

Hmmmm... doing this causes the PR testcase (gcc.dg/loop-unswitch-5.c in 
the attached patch to FAIL).  I haven't looked at it, but I seem to 
recall in the testcase that we could have a DEF that dominated the loop 
but was a mess of PHI's, some of whose args were undefined.

Did you perhaps want to put that dominated_by_p call after the PHI arg 
checks (which seems to work)?:

       /* Check that all the PHI args are fully defined.  */
       if (gphi *phi = dyn_cast <gphi *> (def))
...
...
...

+      /* Uses in stmts always executed when the region header executes
+        are fine.  */
+      if (dominated_by_p (CDI_DOMINATORS, loop->header, gimple_bb (def)))
+       continue;
+
        /* Handle calls and memory loads conservatively.  */
        if (!is_gimple_assign (def)
           || (gimple_assign_single_p (def)

Until this is clear, I've left this dominated_by_p call #if 0'ed out.

>
> and the bail out for PARM_DECLs is wrong:
>
> +      /* A PARM_DECL will not have an SSA_NAME_DEF_STMT.  Parameters
> +        get their initial value from function entry.  */
> +      if (SSA_NAME_VAR (t) && TREE_CODE (SSA_NAME_VAR (t)) == PARM_DECL)
> +       return false;
>
> needs to be a continue; rather than a return false.

Done.

Preliminary test show the attached patch works.  Further tests on-going.

Aldy
gcc/

	PR tree-optimization/71691
	* bitmap.h (class auto_bitmap): New.
	* tree-ssa-loop-unswitch.c (tree_may_unswitch_on): Call
	is_maybe_undefined instead of ssa_undefined_value_p.

gcc/testsuite/

	PR tree-optimization/71691
	* gcc.dg/loop-unswitch-5.c: Test that we actually unswitch a loop.

Comments

Richard Biener Jan. 31, 2017, 8:02 a.m. UTC | #1
On Mon, Jan 30, 2017 at 6:19 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
> On 01/30/2017 10:03 AM, Richard Biener wrote:
>>
>> On Fri, Jan 27, 2017 at 12:20 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
>>>
>>> On 01/26/2017 07:29 AM, Richard Biener wrote:
>>>>
>>>>
>>>> On Thu, Jan 26, 2017 at 1:04 PM, Aldy Hernandez <aldyh@redhat.com>
>>>> wrote:
>>>>>
>>>>>
>>>>> On 01/24/2017 07:23 AM, Richard Biener wrote:
>>>
>>>
>>>
>>>> Your initial on-demand approach is fine to catch some of the cases but
>>>> it
>>>> will not handle those for which we'd need post-dominance.
>>>>
>>>> I guess we can incrementally add that.
>>>
>>>
>>>
>>> No complaints from me.
>>>
>>> This is my initial on-demand approach, with a few fixes you've commented
>>> on
>>> throughout.
>>>
>>> As you can see, there is still an #if 0 wrt to using your suggested
>>> conservative handling of memory loads, which I'm not entirely convinced
>>> of,
>>> as it pessimizes gcc.dg/loop-unswitch-1.c.  If you feel strongly about
>>> it, I
>>> can enable the code again.
>>
>>
>> It is really required -- fortunately loop-unswitch-1.c is one of the cases
>> where
>> the post-dom / always-executed bits help .  The comparison is inside the
>> loop header and thus always executed when the loop enters, so inserrting
>> it
>> on the preheader edge is fine.
>
>
> Left as is then.
>
>>
>>> Also, I enhanced gcc.dg/loop-unswitch-1.c to verify that we're actually
>>> unswitching something.  It seems kinda silly that we have various
>>> unswitch
>>> tests, but we don't actually check whether we have unswitched anything.
>>
>>
>> Heh.  It probably was added for an ICE...
>>
>>> This test was the only one in the *unswitch*.c set that I saw was
>>> actually
>>> being unswitched.  Of course, if you don't agree with my #if 0 above, I
>>> will
>>> remove this change to the test.
>>>
>>> Finally, are we even guaranteed to unswitch in loop-unswitch-1.c across
>>> architectures?  If not, again, I can remove the one-liner.
>>
>>
>> I think so.
>
>
> Left as well.
>
>>
>>>
>>> How does this look for trunk?
>>
>>
>> With a unswitch-local solution I meant to not add new files but put the
>> defined_or_undefined class (well, or rather a single function...) into
>> tree-ssa-loop-unswitch.c.
>
>
> Done.
>
>>
>> @@ -138,7 +141,7 @@ tree_may_unswitch_on (basic_block bb, struct loop
>> *loop)
>>      {
>>        /* Unswitching on undefined values would introduce undefined
>>          behavior that the original program might never exercise.  */
>> -      if (ssa_undefined_value_p (use, true))
>> +      if (defined_or_undefined->is_maybe_undefined (use))
>>         return NULL_TREE;
>>        def = SSA_NAME_DEF_STMT (use);
>>        def_bb = gimple_bb (def);
>>
>> as I said, moving this (now more expensive check) after
>>
>>       if (def_bb
>>           && flow_bb_inside_loop_p (loop, def_bb))
>>         return NULL_TREE;
>>
>> this cheap check would be better.  It should avoid 99% of all calls I bet.
>
>
> Done.
>
>>
>> You can recover the loop-unswitch-1.c case by passing down
>> the using stmt and checking its BB against loop_header (the only
>> block that we trivially know is always executed when entering the region).
>> Or do that check in the caller, like
>>
>>         if (bb != loop->header
>>            && ssa_undefined_value_p (use, true) /
>> defined_or_undefined->is_maybe_undefined (use))
>
>
> Done in callee.
>
>>
>> +      gimple *def = SSA_NAME_DEF_STMT (t);
>> +
>> +      /* Check that all the PHI args are fully defined.  */
>> +      if (gphi *phi = dyn_cast <gphi *> (def))
>> +       {
>> +         if (virtual_operand_p (PHI_RESULT (phi)))
>> +           continue;
>>
>> You should never run into virtual operands (you only walk SSA_OP_USEs).
>
>
> Done.
>
>>
>> You can stop walking at stmts that dominate the region header,
>> like with
>>
>> +      gimple *def = SSA_NAME_DEF_STMT (t);
>>         /* Uses in stmts always executed when the region header
>> executes are fine.  */
>>         if (dominated_by_p (CDI_DOMINATORS, loop_header, gimple_bb (def)))
>>           continue;
>
>
> Hmmmm... doing this causes the PR testcase (gcc.dg/loop-unswitch-5.c in the
> attached patch to FAIL).  I haven't looked at it, but I seem to recall in
> the testcase that we could have a DEF that dominated the loop but was a mess
> of PHI's, some of whose args were undefined.
>
> Did you perhaps want to put that dominated_by_p call after the PHI arg
> checks (which seems to work)?:

Ah, yes - of course.  PHIs are not really "defs".

>       /* Check that all the PHI args are fully defined.  */
>       if (gphi *phi = dyn_cast <gphi *> (def))
> ...
> ...
> ...
>
> +      /* Uses in stmts always executed when the region header executes
> +        are fine.  */
> +      if (dominated_by_p (CDI_DOMINATORS, loop->header, gimple_bb (def)))
> +       continue;
> +
>        /* Handle calls and memory loads conservatively.  */
>        if (!is_gimple_assign (def)
>           || (gimple_assign_single_p (def)
>
> Until this is clear, I've left this dominated_by_p call #if 0'ed out.
>
>>
>> and the bail out for PARM_DECLs is wrong:
>>
>> +      /* A PARM_DECL will not have an SSA_NAME_DEF_STMT.  Parameters
>> +        get their initial value from function entry.  */
>> +      if (SSA_NAME_VAR (t) && TREE_CODE (SSA_NAME_VAR (t)) == PARM_DECL)
>> +       return false;
>>
>> needs to be a continue; rather than a return false.
>
>
> Done.
>
> Preliminary test show the attached patch works.  Further tests on-going.

Looks good now, with the #if 0 dominance check moved after the PHI handling.

Thus, ok for trunk if the fixed variant passes testing.

Thanks,
Richard.

> Aldy

Patch
diff mbox

diff --git a/gcc/bitmap.h b/gcc/bitmap.h
index 196738b..f158b44 100644
--- a/gcc/bitmap.h
+++ b/gcc/bitmap.h
@@ -802,4 +802,25 @@  bmp_iter_and_compl (bitmap_iterator *bi, unsigned *bit_no)
        bmp_iter_and_compl (&(ITER), &(BITNUM));				\
        bmp_iter_next (&(ITER), &(BITNUM)))
 
+/* A class that ties the lifetime of a bitmap to its scope.  */
+class auto_bitmap
+{
+ public:
+  auto_bitmap () { bits = BITMAP_ALLOC (NULL); }
+  ~auto_bitmap () { BITMAP_FREE (bits); }
+  // Allow calling bitmap functions on our bitmap.
+  operator bitmap () { return bits; }
+
+ private:
+  // Prevent making a copy that references our bitmap.
+  auto_bitmap (const auto_bitmap &);
+  auto_bitmap &operator = (const auto_bitmap &);
+#if __cplusplus >= 201103L
+  auto_bitmap (auto_bitmap &&);
+  auto_bitmap &operator = (auto_bitmap &&);
+#endif
+
+  bitmap bits;
+};
+
 #endif /* GCC_BITMAP_H */
diff --git a/gcc/testsuite/gcc.dg/loop-unswitch-1.c b/gcc/testsuite/gcc.dg/loop-unswitch-1.c
index 930364c..f6fc41d 100644
--- a/gcc/testsuite/gcc.dg/loop-unswitch-1.c
+++ b/gcc/testsuite/gcc.dg/loop-unswitch-1.c
@@ -1,6 +1,6 @@ 
 /* For PR rtl-optimization/27735  */
 /* { dg-do compile } */
-/* { dg-options "-O2 -funswitch-loops" } */
+/* { dg-options "-O2 -funswitch-loops -fdump-tree-unswitch-details" } */
 
 void set_color(void);
 void xml_colorize_line(unsigned int *p, int state)
@@ -32,3 +32,5 @@  parse_tag: ;
     }
 }
 
+/* Test that we actually unswitched something.  */
+/* { dg-final { scan-tree-dump ";; Unswitching loop" "unswitch" } } */
diff --git a/gcc/testsuite/gcc.dg/loop-unswitch-5.c b/gcc/testsuite/gcc.dg/loop-unswitch-5.c
new file mode 100644
index 0000000..b41e853
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/loop-unswitch-5.c
@@ -0,0 +1,51 @@ 
+/* PR middle-end/71691 */
+/* { dg-do run } */
+/* { dg-options "-fno-tree-vrp -O2 -funswitch-loops -fdump-tree-unswitch-details" } */
+
+/* Note: The -fno-tree-vrp above is only there to avoid VRP papering
+   over the problem.  */
+
+char b;
+short f;
+unsigned e;
+int g = 20;
+
+void
+foo ()
+{
+  int l, h;
+  for (l = 0; l <= 7; l++)
+    {
+      int j = 38;
+      if (g)
+	h = 0;
+      for (; h <= 7; h++)
+	{
+	  int i, k = b % (j % 4);
+	  g = f;
+	  for (;;)
+	    {
+	      j = 6 || b;
+	      if (e)
+		{
+		  for (; j; --j)
+		    if (k)
+		      __builtin_printf ("%d", 9);
+		  if (i)
+		    __builtin_printf ("%d", j);
+		}
+	      if (l)
+		continue;
+	      break;
+	    }
+	  i = f || b;
+	}
+    }
+}
+
+int
+main ()
+{
+  foo ();
+  return 0;
+}
diff --git a/gcc/tree-ssa-loop-unswitch.c b/gcc/tree-ssa-loop-unswitch.c
index 92599fb..28ae7dc 100644
--- a/gcc/tree-ssa-loop-unswitch.c
+++ b/gcc/tree-ssa-loop-unswitch.c
@@ -109,6 +109,84 @@  tree_ssa_unswitch_loops (void)
   return 0;
 }
 
+/* Return TRUE if an SSA_NAME maybe undefined and is therefore
+   unsuitable for unswitching.  STMT is the statement we are
+   considering for unswitching and LOOP is the loop it appears in.  */
+
+static bool
+is_maybe_undefined (const tree name, gimple *stmt, struct loop *loop)
+{
+  /* The loop header is the only block we can trivially determine that
+     will always be executed.  If the comparison is in the loop
+     header, we know it's OK to unswitch on it.  */
+  if (gimple_bb (stmt) == loop->header)
+    return false;
+
+  auto_bitmap visited_ssa;
+  auto_vec<tree> worklist;
+  worklist.safe_push (name);
+  bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
+  while (!worklist.is_empty ())
+    {
+      tree t = worklist.pop ();
+
+      /* If it's obviously undefined, avoid further computations.  */
+      if (ssa_undefined_value_p (t, true))
+	return true;
+
+      /* A PARM_DECL will not have an SSA_NAME_DEF_STMT.  Parameters
+	 get their initial value from function entry.  */
+      if (SSA_NAME_VAR (t) && TREE_CODE (SSA_NAME_VAR (t)) == PARM_DECL)
+	continue;
+
+      gimple *def = SSA_NAME_DEF_STMT (t);
+
+#if 0
+      /* Uses in stmts always executed when the region header executes
+	 are fine.  */
+      if (dominated_by_p (CDI_DOMINATORS, loop->header, gimple_bb (def)))
+	continue;
+#endif
+
+      /* Check that all the PHI args are fully defined.  */
+      if (gphi *phi = dyn_cast <gphi *> (def))
+	{
+	  for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
+	    {
+	      tree t = gimple_phi_arg_def (phi, i);
+	      /* If an SSA has already been seen, it may be a loop,
+		 but we can continue and ignore this use.  Otherwise,
+		 add the SSA_NAME to the queue and visit it later.  */
+	      if (TREE_CODE (t) == SSA_NAME
+		  && bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (t)))
+		worklist.safe_push (t);
+	    }
+	  continue;
+	}
+
+      /* Handle calls and memory loads conservatively.  */
+      if (!is_gimple_assign (def)
+	  || (gimple_assign_single_p (def)
+	      && gimple_vuse (def)))
+	return true;
+
+      /* Check that any SSA names used to define NAME are also fully
+	 defined.  */
+      use_operand_p use_p;
+      ssa_op_iter iter;
+      FOR_EACH_SSA_USE_OPERAND (use_p, def, iter, SSA_OP_USE)
+	{
+	  tree t = USE_FROM_PTR (use_p);
+	  /* If an SSA has already been seen, it may be a loop,
+	     but we can continue and ignore this use.  Otherwise,
+	     add the SSA_NAME to the queue and visit it later.  */
+	  if (bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (t)))
+	    worklist.safe_push (t);
+	}
+    }
+  return false;
+}
+
 /* Checks whether we can unswitch LOOP on condition at end of BB -- one of its
    basic blocks (for what it means see comments below).  */
 
@@ -136,15 +214,15 @@  tree_may_unswitch_on (basic_block bb, struct loop *loop)
   /* Condition must be invariant.  */
   FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, SSA_OP_USE)
     {
-      /* Unswitching on undefined values would introduce undefined
-	 behavior that the original program might never exercise.  */
-      if (ssa_undefined_value_p (use, true))
-	return NULL_TREE;
       def = SSA_NAME_DEF_STMT (use);
       def_bb = gimple_bb (def);
       if (def_bb
 	  && flow_bb_inside_loop_p (loop, def_bb))
 	return NULL_TREE;
+      /* Unswitching on undefined values would introduce undefined
+	 behavior that the original program might never exercise.  */
+      if (is_maybe_undefined (use, stmt, loop))
+	return NULL_TREE;
     }
 
   cond = build2 (gimple_cond_code (stmt), boolean_type_node,