diff mbox

[PR,tree-optimization/71691] Fix unswitching in presence of maybe-undef SSA_NAMEs (take 2)

Message ID 5e79620e-8adb-7768-a34e-32fd286995f0@redhat.com
State New
Headers show

Commit Message

Aldy Hernandez Dec. 16, 2016, 2:41 p.m. UTC
Hi folks.

This is a follow-up on Jeff and Richi's interaction on the 
aforementioned PR here:

	https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01397.html

I decided to explore the idea of analyzing may-undefness on-demand, 
which actually looks rather cheap.

BTW, I don't understand why we don't have auto_bitmap's, as we already 
have auto_sbitmap's.  I've implemented the former based on 
auto_sbitmap's code we already have.

The attached patch fixes the bug without introducing any regressions.

I also tested the patch by compiling 242 .ii files with -O3.  These were 
gathered from a stage1 build with -save-temps.  There is a slight time 
degradation of 4 seconds within 27 minutes of user time:

	tainted:	26:52
	orig:		26:48

This was the average aggregate time of two runs compiling all 242 .ii 
files.  IMO, this looks reasonable.  It is after all, -O3.    Is it 
acceptable?

Aldy
commit 2310bcd0e2552a40ca1de354faf005ed3e9daf4e
Author: Aldy Hernandez <aldyh@redhat.com>
Date:   Fri Dec 16 03:44:52 2016 -0500

            PR tree-optimization/71691
            * bitmap.h (class auto_bitmap): New.
            * tree-ssa-loop-unswitch.c (ssa_maybe_undefined_value_p): New.
            (tree_may_unswitch_on): Call ssa_maybe_undefined_value_p.

Comments

Jeff Law Dec. 19, 2016, 11:02 p.m. UTC | #1
On 12/16/2016 07:41 AM, Aldy Hernandez wrote:
>
> BTW, I don't understand why we don't have auto_bitmap's, as we already
> have auto_sbitmap's.  I've implemented the former based on
> auto_sbitmap's code we already have.
Trevor poked at it a bit.  bitmaps are a bit more complex than sbitmaps 
in terms of implementation details.

https://gcc.gnu.org/ml/gcc/2016-04/msg00013.html

But his suggestion was to first create auto_bitmap, then look to convert 
to using that as opposed to his other approaches.


>
> The attached patch fixes the bug without introducing any regressions.
>
> I also tested the patch by compiling 242 .ii files with -O3.  These were
> gathered from a stage1 build with -save-temps.  There is a slight time
> degradation of 4 seconds within 27 minutes of user time:
>
>     tainted:    26:52
>     orig:        26:48
>
> This was the average aggregate time of two runs compiling all 242 .ii
> files.  IMO, this looks reasonable.  It is after all, -O3.    Is it
> acceptable?
>
> Aldy
>
> curr
>
>
> commit 2310bcd0e2552a40ca1de354faf005ed3e9daf4e
> Author: Aldy Hernandez <aldyh@redhat.com>
> Date:   Fri Dec 16 03:44:52 2016 -0500
>
>             PR tree-optimization/71691
>             * bitmap.h (class auto_bitmap): New.
>             * tree-ssa-loop-unswitch.c (ssa_maybe_undefined_value_p): New.
>             (tree_may_unswitch_on): Call ssa_maybe_undefined_value_p.
So I'm going to defer to Richi since he was reviewing my original 
attempt in this space.

It probably doesn't matter in practice (when I looked at this I couldn't 
get into the code in question with a -O3 bootstrap or with the 
testsuite, just with the testcase in the BZ) but you might consider 
handling an already visited node slightly differently.

If the the node was visited and resolved as undefined, then we would 
have already exited the loop.

If the node was visited and resolved as defined, then we could just keep 
processing other items on the the worklist.

The case where you want to conservatively return false is when you're 
actively processing the name in question.

Jeff
Richard Biener Dec. 20, 2016, 2:16 p.m. UTC | #2
On Fri, Dec 16, 2016 at 3:41 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
> Hi folks.
>
> This is a follow-up on Jeff and Richi's interaction on the aforementioned PR
> here:
>
>         https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01397.html
>
> I decided to explore the idea of analyzing may-undefness on-demand, which
> actually looks rather cheap.
>
> BTW, I don't understand why we don't have auto_bitmap's, as we already have
> auto_sbitmap's.  I've implemented the former based on auto_sbitmap's code we
> already have.
>
> The attached patch fixes the bug without introducing any regressions.
>
> I also tested the patch by compiling 242 .ii files with -O3.  These were
> gathered from a stage1 build with -save-temps.  There is a slight time
> degradation of 4 seconds within 27 minutes of user time:
>
>         tainted:        26:52
>         orig:           26:48
>
> This was the average aggregate time of two runs compiling all 242 .ii files.
> IMO, this looks reasonable.  It is after all, -O3.    Is it acceptable?

+  while (!worklist.is_empty ())
+    {
+      name = worklist.pop ();
+      gcc_assert (TREE_CODE (name) == SSA_NAME);
+
+      if (ssa_undefined_value_p (name, true))
+       return true;
+
+      bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));

it should be already set as we use visited_ssa as "was it ever on the worklist",
so maybe renaming it would be a good thing as well.

+             if (TREE_CODE (name) == SSA_NAME)
+               {
+                 /* If an SSA has already been seen, this may be a loop.
+                    Fail conservatively.  */
+                 if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION (name)))
+                   return false;

so to me "conservative" is returning true, not false.

+                 bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
+                 worklist.safe_push (name);

but for loops we can just continue and ignore this use.  And bitmap_set_bit
returns whether it set a bit, thus

                if (bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name)))
                  worklist.safe_push (name);

should work?

+      /* Check that any SSA names used to define NAME is also fully
+        defined.  */
+      use_operand_p use_p;
+      ssa_op_iter iter;
+      FOR_EACH_SSA_USE_OPERAND (use_p, def, iter, SSA_OP_USE)
+       {
+         name = USE_FROM_PTR (use_p);
+         if (TREE_CODE (name) == SSA_NAME)

always true.

+           {
+             /* If an SSA has already been seen, this may be a loop.
+                Fail conservatively.  */
+             if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION (name)))
+               return false;
+             bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
+             worklist.safe_push (name);

See above.

In principle the thing is sound but I'd like to be able to pass in a bitmap of
known maybe-undefined/must-defined SSA names to have a cache for
multiple invocations from within a pass (like this unswitching case).

Also once you hit defs that are in a post-dominated region of the loop entry
you can treat them as not undefined (as their use invokes undefined
behavior).  This is also how you treat function parameters (well,
ssa_undefined_value_p does), where the call site invokes undefined behavior
when passing in undefined values.  So we need an extra parameter specifying
the post-dominance region.

You do not handle memory or calls conservatively which means the existing
testcase only needs some obfuscation to become a problem again.  To fix
that before /* Check that any SSA names used to define NAME is also fully
defined.  */ bail out conservatively, like

   if (! is_gimple_assign (def)
      || gimple_assign_single_p (def))
    return true;

Only unswitching on conditions that post-dominate the loop entry might be a
simpler solution for the PR in question.  OTOH this may disable too much
unswitching in practice.

Richard.

> Aldy
Jeff Law Dec. 20, 2016, 3:50 p.m. UTC | #3
On 12/20/2016 07:16 AM, Richard Biener wrote:
>
> it should be already set as we use visited_ssa as "was it ever on the worklist",
> so maybe renaming it would be a good thing as well.
>
> +             if (TREE_CODE (name) == SSA_NAME)
> +               {
> +                 /* If an SSA has already been seen, this may be a loop.
> +                    Fail conservatively.  */
> +                 if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION (name)))
> +                   return false;
>
> so to me "conservative" is returning true, not false.
Agreed.

>
> +                 bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
> +                 worklist.safe_push (name);
>
> but for loops we can just continue and ignore this use.  And bitmap_set_bit
> returns whether it set a bit, thus
>
>                 if (bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name)))
>                   worklist.safe_push (name);
>
> should work?
What if we're in an irreducible region?


>
> In principle the thing is sound but I'd like to be able to pass in a bitmap of
> known maybe-undefined/must-defined SSA names to have a cache for
> multiple invocations from within a pass (like this unswitching case).
So that means keeping another bitmap for things positively identified as 
defined, then saving it for later invocations.  So maybe some of my work 
+ some of his melded together.  (mine computes once for the entire FN 
and saves the result, so converting it to on-demand saving the result 
shouldn't be terrible).


>
> Only unswitching on conditions that post-dominate the loop entry might be a
> simpler solution for the PR in question.  OTOH this may disable too much
> unswitching in practice.
It might be worth experimentation.

Jeff
Richard Biener Dec. 20, 2016, 5:33 p.m. UTC | #4
On December 20, 2016 4:50:25 PM GMT+01:00, Jeff Law <law@redhat.com> wrote:
>On 12/20/2016 07:16 AM, Richard Biener wrote:
>>
>> it should be already set as we use visited_ssa as "was it ever on the
>worklist",
>> so maybe renaming it would be a good thing as well.
>>
>> +             if (TREE_CODE (name) == SSA_NAME)
>> +               {
>> +                 /* If an SSA has already been seen, this may be a
>loop.
>> +                    Fail conservatively.  */
>> +                 if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION
>(name)))
>> +                   return false;
>>
>> so to me "conservative" is returning true, not false.
>Agreed.
>
>>
>> +                 bitmap_set_bit (visited_ssa, SSA_NAME_VERSION
>(name));
>> +                 worklist.safe_push (name);
>>
>> but for loops we can just continue and ignore this use.  And
>bitmap_set_bit
>> returns whether it set a bit, thus
>>
>>                 if (bitmap_set_bit (visited_ssa, SSA_NAME_VERSION
>(name)))
>>                   worklist.safe_push (name);
>>
>> should work?
>What if we're in an irreducible region?

Handling all back edges by deferring to their defs should work.  At least I can't see how it would not.

>
>>
>> In principle the thing is sound but I'd like to be able to pass in a
>bitmap of
>> known maybe-undefined/must-defined SSA names to have a cache for
>> multiple invocations from within a pass (like this unswitching case).
>So that means keeping another bitmap for things positively identified
>as 
>defined, then saving it for later invocations.

We eventually need a tristate here for maximum caching.  And as the result depends on the dominating BB of the postdom region the savings might be questionable.

>  So maybe some of my
>work 
>+ some of his melded together.  (mine computes once for the entire FN 
>and saves the result, so converting it to on-demand saving the result 
>shouldn't be terrible).
>
>
>>
>> Only unswitching on conditions that post-dominate the loop entry
>might be a
>> simpler solution for the PR in question.  OTOH this may disable too
>much
>> unswitching in practice.
>It might be worth experimentation.
>
>Jeff
Jeff Law Dec. 21, 2016, 5:47 p.m. UTC | #5
On 12/20/2016 10:33 AM, Richard Biener wrote:
>>>
>>> but for loops we can just continue and ignore this use.  And
>> bitmap_set_bit
>>> returns whether it set a bit, thus
>>>
>>>                 if (bitmap_set_bit (visited_ssa, SSA_NAME_VERSION
>> (name)))
>>>                   worklist.safe_push (name);
>>>
>>> should work?
>> What if we're in an irreducible region?
>
> Handling all back edges by deferring to their defs should work.  At least I can't see how it would not.
I'll take your word for it -- I haven't thought deeply about this.



>
>>
>>>
>>> In principle the thing is sound but I'd like to be able to pass in a
>> bitmap of
>>> known maybe-undefined/must-defined SSA names to have a cache for
>>> multiple invocations from within a pass (like this unswitching case).
>> So that means keeping another bitmap for things positively identified
>> as
>> defined, then saving it for later invocations.
>
> We eventually need a tristate here for maximum caching.  And as the result depends on the dominating BB of the postdom region the savings might be questionable.
Possibly.

Jeff
Aldy Hernandez Dec. 21, 2016, 6:43 p.m. UTC | #6
On 12/20/2016 09:16 AM, Richard Biener wrote:

> You do not handle memory or calls conservatively which means the existing
> testcase only needs some obfuscation to become a problem again.  To fix
> that before /* Check that any SSA names used to define NAME is also fully
> defined.  */ bail out conservatively, like
>
>    if (! is_gimple_assign (def)
>       || gimple_assign_single_p (def))
>     return true;

I understand the !is_gimple_assign, which will skip over GIMPLE_CALLs 
setting a value, but the "gimple_assign_single_p(def) == true"??

Won't this last one bail on things like e.3_7 = e, or x_77 = y_88? 
Don't we want to follow up the def chain precisely on these?

Thanks.
Aldy
Richard Biener Jan. 4, 2017, 11:42 a.m. UTC | #7
On Wed, Dec 21, 2016 at 7:43 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
> On 12/20/2016 09:16 AM, Richard Biener wrote:
>
>> You do not handle memory or calls conservatively which means the existing
>> testcase only needs some obfuscation to become a problem again.  To fix
>> that before /* Check that any SSA names used to define NAME is also fully
>> defined.  */ bail out conservatively, like
>>
>>    if (! is_gimple_assign (def)
>>       || gimple_assign_single_p (def))
>>     return true;
>
>
> I understand the !is_gimple_assign, which will skip over GIMPLE_CALLs
> setting a value, but the "gimple_assign_single_p(def) == true"??
>
> Won't this last one bail on things like e.3_7 = e, or x_77 = y_88? Don't we
> want to follow up the def chain precisely on these?

For the first (= e) we can't, for the second, yes, the predicate is too strict.
But it's conservative ;)

Likewise for e_2 = VIEW_CONVERT_EXPR <u_88> btw, or REAL/IMAGPART_EXPR.

Note both VIEW_CONVERT_EXPR and REAL/IMAGPART_EXPR will also appear
in RHSs we can't handle.

Richard.

>
> Thanks.
> Aldy
Trevor Saunders Jan. 5, 2017, 4:16 a.m. UTC | #8
On Fri, Dec 16, 2016 at 09:41:33AM -0500, Aldy Hernandez wrote:
> Hi folks.
> 
> This is a follow-up on Jeff and Richi's interaction on the aforementioned PR
> here:
> 
> 	https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01397.html
> 
> I decided to explore the idea of analyzing may-undefness on-demand, which
> actually looks rather cheap.
> 
> BTW, I don't understand why we don't have auto_bitmap's, as we already have
> auto_sbitmap's.  I've implemented the former based on auto_sbitmap's code we
> already have.

No good reason, it just hasn't been done yet.

I'm tempted to fit both this and auto_sbitmap into a unique_ptr with
custom deletor, but that would make it hard to do small object
optimizations, so maybe it isn't actually a great idea.

> +class auto_bitmap
> +{
> + public:
> +  auto_bitmap () { bits = BITMAP_ALLOC (NULL); }
> +  ~auto_bitmap () { BITMAP_FREE (bits); }

We could make the member a bitmap_head, and then use bitmap_initialize
and bitmap_clear in the ctor and dtor, that'd save a alloc / dealloc.

You might also want to use the CXX_MEM_STAT macros on the ctor to get
better attribution of memory usage.

> +  // Prevent making a copy that references our bitmap.
> +  auto_bitmap (const auto_bitmap &);
> +  auto_bitmap &operator = (const auto_bitmap &);
> +#if __cplusplus >= 201103L
> +  auto_bitmap (auto_bitmap &&);
> +  auto_bitmap &operator = (auto_bitmap &&);

We could actually support moving bitmaps pretty easily, though we would
need to do the auto_ptr style hacks to keep building as c++98.  I'll
probably do that work eventually for unique_ptr, but its kind of late
for gcc 8 unless we just use it to fix memory leaks.

> +#endif
> +
> +  bitmap bits;

style guide would say that should be m_bits I believe.

Trev

sorry about the lag here.
diff mbox

Patch

diff --git a/gcc/bitmap.h b/gcc/bitmap.h
index 1b2c8e0..bc32a21 100644
--- a/gcc/bitmap.h
+++ b/gcc/bitmap.h
@@ -802,4 +802,25 @@  bmp_iter_and_compl (bitmap_iterator *bi, unsigned *bit_no)
        bmp_iter_and_compl (&(ITER), &(BITNUM));				\
        bmp_iter_next (&(ITER), &(BITNUM)))
 
+/* A class that ties the lifetime of a bitmap to its scope.  */
+class auto_bitmap
+{
+ public:
+  auto_bitmap () { bits = BITMAP_ALLOC (NULL); }
+  ~auto_bitmap () { BITMAP_FREE (bits); }
+  // Allow calling bitmap functions on our bitmap.
+  operator bitmap () { return bits; }
+
+ private:
+  // Prevent making a copy that references our bitmap.
+  auto_bitmap (const auto_bitmap &);
+  auto_bitmap &operator = (const auto_bitmap &);
+#if __cplusplus >= 201103L
+  auto_bitmap (auto_bitmap &&);
+  auto_bitmap &operator = (auto_bitmap &&);
+#endif
+
+  bitmap bits;
+};
+
 #endif /* GCC_BITMAP_H */
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr71691.c b/gcc/testsuite/gcc.c-torture/execute/pr71691.c
new file mode 100644
index 0000000..1ffd1a2
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr71691.c
@@ -0,0 +1,47 @@ 
+/* { dg-options "-fno-tree-vrp" } */
+
+char b;
+short f;
+unsigned e;
+int g = 20;
+
+void
+foo ()
+{
+  int l, h;
+  for (l = 0; l <= 7; l++)
+    {
+      int j = 38;
+      if (g)
+	h = 0;
+      for (; h <= 7; h++)
+	{
+	  int i, k = b % (j % 4);
+	  g = f;
+	  for (;;)
+	    {
+	      j = 6 || b;
+	      if (e)
+		{
+		  for (; j; --j)
+		    if (k)
+		      __builtin_printf ("%d", 9);
+		  if (i)
+		    __builtin_printf ("%d", j);
+		}
+	      if (l)
+		continue;
+	      break;
+	    }
+	  i = f || b;
+	}
+    }
+}
+
+int
+main ()
+{
+  foo ();
+  return 0;
+}
+
diff --git a/gcc/tree-ssa-loop-unswitch.c b/gcc/tree-ssa-loop-unswitch.c
index 40905af..bc34395 100644
--- a/gcc/tree-ssa-loop-unswitch.c
+++ b/gcc/tree-ssa-loop-unswitch.c
@@ -109,6 +109,71 @@  tree_ssa_unswitch_loops (void)
   return 0;
 }
 
+/* Return TRUE if an SSA_NAME may be undefined.  */
+
+static bool
+ssa_maybe_undefined_value_p (tree name)
+{
+  /* If it's obviously undefined, avoid further computations.  */
+  if (ssa_undefined_value_p (name, true))
+    return true;
+
+  auto_bitmap visited_ssa;
+  auto_vec<tree> worklist;
+  worklist.safe_push (name);
+  while (!worklist.is_empty ())
+    {
+      name = worklist.pop ();
+      gcc_assert (TREE_CODE (name) == SSA_NAME);
+
+      if (ssa_undefined_value_p (name, true))
+	return true;
+
+      bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
+
+      /* Check that all the PHI args are fully defined.  */
+      gimple *def = SSA_NAME_DEF_STMT (name);
+      if (gphi *phi = dyn_cast <gphi *> (def))
+	{
+	  if (virtual_operand_p (PHI_RESULT (phi)))
+	    continue;
+	  for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
+	    {
+	      name = gimple_phi_arg_def (phi, i);
+	      if (TREE_CODE (name) == SSA_NAME)
+		{
+		  /* If an SSA has already been seen, this may be a loop.
+		     Fail conservatively.  */
+		  if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION (name)))
+		    return false;
+		  bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
+		  worklist.safe_push (name);
+		}
+	    }
+	  continue;
+	}
+
+      /* Check that any SSA names used to define NAME is also fully
+	 defined.  */
+      use_operand_p use_p;
+      ssa_op_iter iter;
+      FOR_EACH_SSA_USE_OPERAND (use_p, def, iter, SSA_OP_USE)
+	{
+	  name = USE_FROM_PTR (use_p);
+	  if (TREE_CODE (name) == SSA_NAME)
+	    {
+	      /* If an SSA has already been seen, this may be a loop.
+		 Fail conservatively.  */
+	      if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION (name)))
+		return false;
+	      bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name));
+	      worklist.safe_push (name);
+	    }
+	}
+    }
+  return false;
+}
+
 /* Checks whether we can unswitch LOOP on condition at end of BB -- one of its
    basic blocks (for what it means see comments below).  */
 
@@ -138,7 +203,7 @@  tree_may_unswitch_on (basic_block bb, struct loop *loop)
     {
       /* Unswitching on undefined values would introduce undefined
 	 behavior that the original program might never exercise.  */
-      if (ssa_undefined_value_p (use, true))
+      if (ssa_maybe_undefined_value_p (use))
 	return NULL_TREE;
       def = SSA_NAME_DEF_STMT (use);
       def_bb = gimple_bb (def);