diff mbox

Improve VRP assert creation for loops

Message ID 20131106171342.GF27813@tucnak.zalov.cz
State New
Headers show

Commit Message

Jakub Jelinek Nov. 6, 2013, 5:13 p.m. UTC
On Tue, Nov 05, 2013 at 02:00:16PM -0800, Cong Hou wrote:
> > I'm also curious -- did this code show up in a particular benchmark, if so,
> > which one?
> 
> I didn't find this problem from any benchmark, but from another
> concern about loop upper bound estimation. Look at the following code:
> 
> int foo(unsigned int n, int r)
> {
>   int i;
>   if (n > 0)
>     if (n < 4)
>     {
>       do {
>          --n;
>          r *= 2;
>       } while (n > 0);
>     }
>   return r+n;
> }
> 
> 
> In order to get the upper bound of the loop in this function, GCC
> traverses conditions n<4 and n>0 separately and tries to get any
> useful information. But as those two conditions cannot be combined
> into one due to this issue (note that n>0 will be transformed into
> n!=0), when GCC sees n<4, it will consider the possibility that n may
> be equal to 0, in which case the upper bound is UINT_MAX. If those two
> conditions can be combined into one, which is n-1<=2, then we can get
> the correct upper bound of the loop.

I've looked at the above testcase to see why we aren't able to determine
the number of iterations upper bound properly here.

That doesn't mean your patch is useless, though I must say I'm far from
being convinced it is safe ATM and also it looks like quite ugly special
case (trying to undo a VRP optimization but only one single specific case).

The first problem is VRP, we end up with:
  <bb 5>:
  # n_1 = PHI <n_5(D)(4), n_7(6)>
  # r_3 = PHI <r_6(D)(4), r_8(6)>
  # RANGE [0, 4294967295]
  n_7 = n_1 + 4294967295;
  # RANGE [-2147483648, 2147483647] NONZERO 0x000000000fffffffe
  r_8 = r_3 * 2;
  if (n_7 != 0)
    goto <bb 6>;
  else
    goto <bb 7>;
  
  <bb 6>:
  goto <bb 5>;
- missing range on n_1, extremely conservative range on n_7.  With the
attached patch we get instead:
  <bb 5>:
  # RANGE [1, 3] NONZERO 0x00000000000000003
  # n_1 = PHI <n_5(D)(4), n_7(6)>
  # r_3 = PHI <r_6(D)(4), r_8(6)>
  # RANGE [0, 2] NONZERO 0x00000000000000003
  n_7 = n_1 + 4294967295;
  # RANGE [-2147483648, 2147483647] NONZERO 0x000000000fffffffe
  r_8 = r_3 * 2;
  if (n_7 != 0)
    goto <bb 6>;
  else
    goto <bb 7>;

  <bb 6>:
  goto <bb 5>;

The issue is that we use live_on_edge to determine if it is desirable
to added ASSERT_EXPRs, but as we walk bbs in RPO order, we first enter
the loop through the bb with exit edge and thus live of the latch isn't
computed (and, generally the propagation for live ignores dfs back edges
anyway), and because in the above live_on_edge ((5)->(6), n_7) is false,
we don't add ASSERT_EXPR that n_7 is != 0 in the latch bb, so during
iteration we start with n_1 being assumed [1, 3] (that is range of the
assertion from the preceeding conditions on n_5(D)), but in the next
iteration widen it to [0, 3] because we think n_7 can be [0, 2] in the
PHI and thus end up with uselessly wide range because we think the
subtraction can wrap around.  This patch improves live_on_edge for
empty latches, by just looking at the PHIs on loop->header from the
latch -> header edge and noting which SSA_NAMEs are used there.

I had to add -fno-tree-vrp to 4 unroll_*.c tests, because they disable
various tree optimizations already and want to test unrolling of loops
iterating exactly twice, but with this VRP change VRP is smart enough
to replace the PHI argument on the i IV from
-  # i_13 = PHI <i_8(3), 0(2)>
+  # i_13 = PHI <1(3), 0(2)>
(the loop iterates exactly twice) and RTL unrolling doesn't do the
tested thing in that case.  But, this should affect only loops that roll
exactly twice and those realy should be unrolled already far before, so IMHO
it isn't something to try to optimize better at the RTL level.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2013-11-06  Jakub Jelinek  <jakub@redhat.com>

	* tree-vrp.c (live_on_edge): If e->dest is an empty latch
	of some loop, but live[e->dest->index] is not computed, look
	for SSA_NAMEs used in loop header PHIs from the latch edge.

	* gcc.dg/unroll_1.c: Add -fno-tree-vrp to dg-options.
	* gcc.dg/unroll_2.c: Likewise.
	* gcc.dg/unroll_3.c: Likewise.
	* gcc.dg/unroll_4.c: Likewise.



	Jakub

Comments

Jeff Law Nov. 6, 2013, 10:06 p.m. UTC | #1
On 11/06/13 10:13, Jakub Jelinek wrote:
> On Tue, Nov 05, 2013 at 02:00:16PM -0800, Cong Hou wrote:
>>> I'm also curious -- did this code show up in a particular benchmark, if so,
>>> which one?
>>
>> I didn't find this problem from any benchmark, but from another
>> concern about loop upper bound estimation. Look at the following code:
>>
>> int foo(unsigned int n, int r)
>> {
>>    int i;
>>    if (n > 0)
>>      if (n < 4)
>>      {
>>        do {
>>           --n;
>>           r *= 2;
>>        } while (n > 0);
>>      }
>>    return r+n;
>> }
>>
>>
>> In order to get the upper bound of the loop in this function, GCC
>> traverses conditions n<4 and n>0 separately and tries to get any
>> useful information. But as those two conditions cannot be combined
>> into one due to this issue (note that n>0 will be transformed into
>> n!=0), when GCC sees n<4, it will consider the possibility that n may
>> be equal to 0, in which case the upper bound is UINT_MAX. If those two
>> conditions can be combined into one, which is n-1<=2, then we can get
>> the correct upper bound of the loop.
>
> I've looked at the above testcase to see why we aren't able to determine
> the number of iterations upper bound properly here.
>
> That doesn't mean your patch is useless, though I must say I'm far from
> being convinced it is safe ATM and also it looks like quite ugly special
> case (trying to undo a VRP optimization but only one single specific case).
>
> The first problem is VRP, we end up with:
>    <bb 5>:
>    # n_1 = PHI <n_5(D)(4), n_7(6)>
>    # r_3 = PHI <r_6(D)(4), r_8(6)>
>    # RANGE [0, 4294967295]
>    n_7 = n_1 + 4294967295;
>    # RANGE [-2147483648, 2147483647] NONZERO 0x000000000fffffffe
>    r_8 = r_3 * 2;
>    if (n_7 != 0)
>      goto <bb 6>;
>    else
>      goto <bb 7>;
>
>    <bb 6>:
>    goto <bb 5>;
> - missing range on n_1, extremely conservative range on n_7.  With the
> attached patch we get instead:
>    <bb 5>:
>    # RANGE [1, 3] NONZERO 0x00000000000000003
>    # n_1 = PHI <n_5(D)(4), n_7(6)>
>    # r_3 = PHI <r_6(D)(4), r_8(6)>
>    # RANGE [0, 2] NONZERO 0x00000000000000003
>    n_7 = n_1 + 4294967295;
>    # RANGE [-2147483648, 2147483647] NONZERO 0x000000000fffffffe
>    r_8 = r_3 * 2;
>    if (n_7 != 0)
>      goto <bb 6>;
>    else
>      goto <bb 7>;
>
>    <bb 6>:
>    goto <bb 5>;
>
> The issue is that we use live_on_edge to determine if it is desirable
> to added ASSERT_EXPRs, but as we walk bbs in RPO order, we first enter
> the loop through the bb with exit edge and thus live of the latch isn't
> computed (and, generally the propagation for live ignores dfs back edges
> anyway), and because in the above live_on_edge ((5)->(6), n_7) is false,
> we don't add ASSERT_EXPR that n_7 is != 0 in the latch bb, so during
> iteration we start with n_1 being assumed [1, 3] (that is range of the
> assertion from the preceeding conditions on n_5(D)), but in the next
> iteration widen it to [0, 3] because we think n_7 can be [0, 2] in the
> PHI and thus end up with uselessly wide range because we think the
> subtraction can wrap around.  This patch improves live_on_edge for
> empty latches, by just looking at the PHIs on loop->header from the
> latch -> header edge and noting which SSA_NAMEs are used there.
>
> I had to add -fno-tree-vrp to 4 unroll_*.c tests, because they disable
> various tree optimizations already and want to test unrolling of loops
> iterating exactly twice, but with this VRP change VRP is smart enough
> to replace the PHI argument on the i IV from
> -  # i_13 = PHI <i_8(3), 0(2)>
> +  # i_13 = PHI <1(3), 0(2)>
> (the loop iterates exactly twice) and RTL unrolling doesn't do the
> tested thing in that case.  But, this should affect only loops that roll
> exactly twice and those realy should be unrolled already far before, so IMHO
> it isn't something to try to optimize better at the RTL level.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2013-11-06  Jakub Jelinek  <jakub@redhat.com>
>
> 	* tree-vrp.c (live_on_edge): If e->dest is an empty latch
> 	of some loop, but live[e->dest->index] is not computed, look
> 	for SSA_NAMEs used in loop header PHIs from the latch edge.
>
> 	* gcc.dg/unroll_1.c: Add -fno-tree-vrp to dg-options.
> 	* gcc.dg/unroll_2.c: Likewise.
> 	* gcc.dg/unroll_3.c: Likewise.
> 	* gcc.dg/unroll_4.c: Likewise.
OK with a testcase verifying we get the tighter ranges.

I wonder a bit how often this situation occurs in practice 
(unnecessarily wide range due to missing an ASSERT_EXPR on the latch edge.

jeff
Richard Biener Nov. 7, 2013, 8:48 a.m. UTC | #2
On Wed, 6 Nov 2013, Jakub Jelinek wrote:

> On Tue, Nov 05, 2013 at 02:00:16PM -0800, Cong Hou wrote:
> > > I'm also curious -- did this code show up in a particular benchmark, if so,
> > > which one?
> > 
> > I didn't find this problem from any benchmark, but from another
> > concern about loop upper bound estimation. Look at the following code:
> > 
> > int foo(unsigned int n, int r)
> > {
> >   int i;
> >   if (n > 0)
> >     if (n < 4)
> >     {
> >       do {
> >          --n;
> >          r *= 2;
> >       } while (n > 0);
> >     }
> >   return r+n;
> > }
> > 
> > 
> > In order to get the upper bound of the loop in this function, GCC
> > traverses conditions n<4 and n>0 separately and tries to get any
> > useful information. But as those two conditions cannot be combined
> > into one due to this issue (note that n>0 will be transformed into
> > n!=0), when GCC sees n<4, it will consider the possibility that n may
> > be equal to 0, in which case the upper bound is UINT_MAX. If those two
> > conditions can be combined into one, which is n-1<=2, then we can get
> > the correct upper bound of the loop.
> 
> I've looked at the above testcase to see why we aren't able to determine
> the number of iterations upper bound properly here.
> 
> That doesn't mean your patch is useless, though I must say I'm far from
> being convinced it is safe ATM and also it looks like quite ugly special
> case (trying to undo a VRP optimization but only one single specific case).
> 
> The first problem is VRP, we end up with:
>   <bb 5>:
>   # n_1 = PHI <n_5(D)(4), n_7(6)>
>   # r_3 = PHI <r_6(D)(4), r_8(6)>
>   # RANGE [0, 4294967295]
>   n_7 = n_1 + 4294967295;
>   # RANGE [-2147483648, 2147483647] NONZERO 0x000000000fffffffe
>   r_8 = r_3 * 2;
>   if (n_7 != 0)
>     goto <bb 6>;
>   else
>     goto <bb 7>;
>   
>   <bb 6>:
>   goto <bb 5>;
> - missing range on n_1, extremely conservative range on n_7.  With the
> attached patch we get instead:
>   <bb 5>:
>   # RANGE [1, 3] NONZERO 0x00000000000000003
>   # n_1 = PHI <n_5(D)(4), n_7(6)>
>   # r_3 = PHI <r_6(D)(4), r_8(6)>
>   # RANGE [0, 2] NONZERO 0x00000000000000003
>   n_7 = n_1 + 4294967295;
>   # RANGE [-2147483648, 2147483647] NONZERO 0x000000000fffffffe
>   r_8 = r_3 * 2;
>   if (n_7 != 0)
>     goto <bb 6>;
>   else
>     goto <bb 7>;
> 
>   <bb 6>:
>   goto <bb 5>;
> 
> The issue is that we use live_on_edge to determine if it is desirable
> to added ASSERT_EXPRs, but as we walk bbs in RPO order, we first enter
> the loop through the bb with exit edge and thus live of the latch isn't
> computed (and, generally the propagation for live ignores dfs back edges
> anyway), and because in the above live_on_edge ((5)->(6), n_7) is false,
> we don't add ASSERT_EXPR that n_7 is != 0 in the latch bb, so during
> iteration we start with n_1 being assumed [1, 3] (that is range of the
> assertion from the preceeding conditions on n_5(D)), but in the next
> iteration widen it to [0, 3] because we think n_7 can be [0, 2] in the
> PHI and thus end up with uselessly wide range because we think the
> subtraction can wrap around.  This patch improves live_on_edge for
> empty latches, by just looking at the PHIs on loop->header from the
> latch -> header edge and noting which SSA_NAMEs are used there.
> 
> I had to add -fno-tree-vrp to 4 unroll_*.c tests, because they disable
> various tree optimizations already and want to test unrolling of loops
> iterating exactly twice, but with this VRP change VRP is smart enough
> to replace the PHI argument on the i IV from
> -  # i_13 = PHI <i_8(3), 0(2)>
> +  # i_13 = PHI <1(3), 0(2)>
> (the loop iterates exactly twice) and RTL unrolling doesn't do the
> tested thing in that case.  But, this should affect only loops that roll
> exactly twice and those realy should be unrolled already far before, so IMHO
> it isn't something to try to optimize better at the RTL level.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2013-11-06  Jakub Jelinek  <jakub@redhat.com>
> 
> 	* tree-vrp.c (live_on_edge): If e->dest is an empty latch
> 	of some loop, but live[e->dest->index] is not computed, look
> 	for SSA_NAMEs used in loop header PHIs from the latch edge.
> 
> 	* gcc.dg/unroll_1.c: Add -fno-tree-vrp to dg-options.
> 	* gcc.dg/unroll_2.c: Likewise.
> 	* gcc.dg/unroll_3.c: Likewise.
> 	* gcc.dg/unroll_4.c: Likewise.
> 
> --- gcc/tree-vrp.c.jj	2013-11-06 08:48:01.000000000 +0100
> +++ gcc/tree-vrp.c	2013-11-06 09:32:19.205104029 +0100
> @@ -92,6 +92,42 @@ static sbitmap *live;
>  static bool
>  live_on_edge (edge e, tree name)
>  {
> +  if (live[e->dest->index] == NULL)
> +    {

I'd rather have live[] computed "correctly" than the following
which I think is a hack ... as you say the issue is ordering
(or rather that there isn't an "order" for CFG cycles unless
we want to iterate).  For loop BB order we explicitely handle
the latch, maybe just using a different order than
RPO order, with special-casing the latch, makes more sense?

Richard.

> +      /* Handle edges to empty latch blocks.  If NAME appears
> +	 in loop header phis on edges from latch, return true
> +	 and remember those SSA_NAMEs.  */
> +      basic_block bb = e->dest;
> +      if (bb->loop_father
> +	  && bb->loop_father->latch == bb
> +	  && single_succ_p (bb)
> +	  && empty_block_p (bb))
> +	{
> +	  gimple_stmt_iterator gsi;
> +	  edge e2 = single_succ_edge (bb);
> +	  for (gsi = gsi_start_phis (e2->dest);
> +	       !gsi_end_p (gsi);
> +	       gsi_next (&gsi))
> +	    {
> +	      gimple phi = gsi_stmt (gsi);
> +	      tree res = gimple_phi_result (phi), arg;
> +
> +	      if (virtual_operand_p (res))
> +		continue;
> +	      arg = PHI_ARG_DEF_FROM_EDGE (phi, e2);
> +	      if (TREE_CODE (arg) == SSA_NAME)
> +		{
> +		  if (live[e->dest->index] == NULL)
> +		    {
> +		      live[e->dest->index] = sbitmap_alloc (num_ssa_names);
> +		      bitmap_clear (live[e->dest->index]);
> +		    }
> +		  bitmap_set_bit (live[e->dest->index],
> +				  SSA_NAME_VERSION (arg));
> +		}
> +	    }
> +	}
> +    }
>    return (live[e->dest->index]
>  	  && bitmap_bit_p (live[e->dest->index], SSA_NAME_VERSION (name)));
>  }
> --- gcc/testsuite/gcc.dg/unroll_1.c.jj	2013-09-09 11:32:36.000000000 +0200
> +++ gcc/testsuite/gcc.dg/unroll_1.c	2013-11-06 17:10:52.900722932 +0100
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-rtl-loop2_unroll=stderr -fno-peel-loops -fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll" } */
> +/* { dg-options "-O2 -fdump-rtl-loop2_unroll=stderr -fno-peel-loops -fno-tree-vrp -fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll" } */
>  
>  unsigned a[100], b[100];
>  inline void bar()
> --- gcc/testsuite/gcc.dg/unroll_2.c.jj	2013-08-30 14:38:39.000000000 +0200
> +++ gcc/testsuite/gcc.dg/unroll_2.c	2013-11-06 17:10:30.751845504 +0100
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops -fdisable-tree-cunroll=foo -fdisable-tree-cunrolli=foo -fenable-rtl-loop2_unroll" } */
> +/* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops -fno-tree-vrp -fdisable-tree-cunroll=foo -fdisable-tree-cunrolli=foo -fenable-rtl-loop2_unroll" } */
>  
>  unsigned a[100], b[100];
>  inline void bar()
> --- gcc/testsuite/gcc.dg/unroll_3.c.jj	2013-08-30 14:38:39.000000000 +0200
> +++ gcc/testsuite/gcc.dg/unroll_3.c	2013-11-06 17:10:38.864800338 +0100
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops -fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll=foo" } */
> +/* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops -fno-tree-vrp -fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll=foo" } */
>  
>  unsigned a[100], b[100];
>  inline void bar()
> --- gcc/testsuite/gcc.dg/unroll_4.c.jj	2013-08-30 14:38:40.000000000 +0200
> +++ gcc/testsuite/gcc.dg/unroll_4.c	2013-11-06 17:11:03.816665603 +0100
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops -fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll=foo2" } */
> +/* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops -fno-tree-vrp -fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll=foo2" } */
>  
>  unsigned a[100], b[100];
>  inline void bar()
> 
> 
> 	Jakub
> 
>
Jakub Jelinek Nov. 7, 2013, 9:07 a.m. UTC | #3
On Thu, Nov 07, 2013 at 09:48:46AM +0100, Richard Biener wrote:
> > --- gcc/tree-vrp.c.jj	2013-11-06 08:48:01.000000000 +0100
> > +++ gcc/tree-vrp.c	2013-11-06 09:32:19.205104029 +0100
> > @@ -92,6 +92,42 @@ static sbitmap *live;
> >  static bool
> >  live_on_edge (edge e, tree name)
> >  {
> > +  if (live[e->dest->index] == NULL)
> > +    {
> 
> I'd rather have live[] computed "correctly" than the following
> which I think is a hack ... as you say the issue is ordering
> (or rather that there isn't an "order" for CFG cycles unless
> we want to iterate).  For loop BB order we explicitely handle
> the latch, maybe just using a different order than
> RPO order, with special-casing the latch, makes more sense?

But is there any order that would help?  Sure, we could say just tweak
the rpo/bb_rpo arrays and put there latch bbs before any other bbs from the
same loop, but that would only help us in calling find_assert_locations_1
on the latch before other bbs in the loop.  But that call does nothing,
and we'd need to special case handling of the live for the latch block
anyway, just not in live_on_edge, but in find_assert_locations instead.

Is that what what you are looking for?

Because without that special casing of handling latch edges, you'd need to
process the header rather than latch first, and propagate through DFS_BACK
edges, but then you couldn't propagate through other edges, nor in this
testcase would be the latch live computed when processing the header which
has there the condition.

	Jakub
Richard Biener Nov. 7, 2013, 9:32 a.m. UTC | #4
On Thu, 7 Nov 2013, Jakub Jelinek wrote:

> On Thu, Nov 07, 2013 at 09:48:46AM +0100, Richard Biener wrote:
> > > --- gcc/tree-vrp.c.jj	2013-11-06 08:48:01.000000000 +0100
> > > +++ gcc/tree-vrp.c	2013-11-06 09:32:19.205104029 +0100
> > > @@ -92,6 +92,42 @@ static sbitmap *live;
> > >  static bool
> > >  live_on_edge (edge e, tree name)
> > >  {
> > > +  if (live[e->dest->index] == NULL)
> > > +    {
> > 
> > I'd rather have live[] computed "correctly" than the following
> > which I think is a hack ... as you say the issue is ordering
> > (or rather that there isn't an "order" for CFG cycles unless
> > we want to iterate).  For loop BB order we explicitely handle
> > the latch, maybe just using a different order than
> > RPO order, with special-casing the latch, makes more sense?
> 
> But is there any order that would help?  Sure, we could say just tweak
> the rpo/bb_rpo arrays and put there latch bbs before any other bbs from the
> same loop, but that would only help us in calling find_assert_locations_1
> on the latch before other bbs in the loop.  But that call does nothing,
> and we'd need to special case handling of the live for the latch block
> anyway, just not in live_on_edge, but in find_assert_locations instead.
> 
> Is that what what you are looking for?
> 
> Because without that special casing of handling latch edges, you'd need to
> process the header rather than latch first, and propagate through DFS_BACK
> edges, but then you couldn't propagate through other edges, nor in this
> testcase would be the latch live computed when processing the header which
> has there the condition.

I'm looking for adjusting of live compute - either by adjusting the
algorithm or by special casing the latches.  Like for example
with the following (untested, needs cleanup - the PHI processing
can be factored out from find_assert_locations_1 and re-used):

@@ -5904,6 +5898,27 @@ find_assert_locations (void)
   for (i = 0; i < rpo_cnt; ++i)
     bb_rpo[rpo[i]] = i;
 
+  /* Pre-seed loop latch liveness from loop header PHI nodes.  Due to
+     the order we compute liveness and insert asserts we otherwise
+     fail to insert asserts into the loop latch.  */
+  loop_p loop;
+  loop_iterator li;
+  FOR_EACH_LOOP (li, loop, 0)
+    {
+      i = loop->latch->index;
+      live[i] = sbitmap_alloc (num_ssa_names);
+      bitmap_clear (live[i]);
+      for (gimple_stmt_iterator gsi = gsi_start_phis (loop->header);
+          !gsi_end_p (gsi); gsi_next (&gsi))
+       {
+         gimple phi = gsi_stmt (gsi);
+         if (virtual_operand_p (gimple_phi_result (phi)))
+           continue;
+         for (unsigned j = 0; j < gimple_phi_num_args (phi); ++j)
+           if (TREE_CODE (gimple_phi_arg_def (phi, j)) == SSA_NAME)
+             bitmap_set_bit (live[i], SSA_NAME_VERSION 
(gimple_phi_arg_def (phi, j)));
+       }
+    }
diff mbox

Patch

--- gcc/tree-vrp.c.jj	2013-11-06 08:48:01.000000000 +0100
+++ gcc/tree-vrp.c	2013-11-06 09:32:19.205104029 +0100
@@ -92,6 +92,42 @@  static sbitmap *live;
 static bool
 live_on_edge (edge e, tree name)
 {
+  if (live[e->dest->index] == NULL)
+    {
+      /* Handle edges to empty latch blocks.  If NAME appears
+	 in loop header phis on edges from latch, return true
+	 and remember those SSA_NAMEs.  */
+      basic_block bb = e->dest;
+      if (bb->loop_father
+	  && bb->loop_father->latch == bb
+	  && single_succ_p (bb)
+	  && empty_block_p (bb))
+	{
+	  gimple_stmt_iterator gsi;
+	  edge e2 = single_succ_edge (bb);
+	  for (gsi = gsi_start_phis (e2->dest);
+	       !gsi_end_p (gsi);
+	       gsi_next (&gsi))
+	    {
+	      gimple phi = gsi_stmt (gsi);
+	      tree res = gimple_phi_result (phi), arg;
+
+	      if (virtual_operand_p (res))
+		continue;
+	      arg = PHI_ARG_DEF_FROM_EDGE (phi, e2);
+	      if (TREE_CODE (arg) == SSA_NAME)
+		{
+		  if (live[e->dest->index] == NULL)
+		    {
+		      live[e->dest->index] = sbitmap_alloc (num_ssa_names);
+		      bitmap_clear (live[e->dest->index]);
+		    }
+		  bitmap_set_bit (live[e->dest->index],
+				  SSA_NAME_VERSION (arg));
+		}
+	    }
+	}
+    }
   return (live[e->dest->index]
 	  && bitmap_bit_p (live[e->dest->index], SSA_NAME_VERSION (name)));
 }
--- gcc/testsuite/gcc.dg/unroll_1.c.jj	2013-09-09 11:32:36.000000000 +0200
+++ gcc/testsuite/gcc.dg/unroll_1.c	2013-11-06 17:10:52.900722932 +0100
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-rtl-loop2_unroll=stderr -fno-peel-loops -fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll" } */
+/* { dg-options "-O2 -fdump-rtl-loop2_unroll=stderr -fno-peel-loops -fno-tree-vrp -fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll" } */
 
 unsigned a[100], b[100];
 inline void bar()
--- gcc/testsuite/gcc.dg/unroll_2.c.jj	2013-08-30 14:38:39.000000000 +0200
+++ gcc/testsuite/gcc.dg/unroll_2.c	2013-11-06 17:10:30.751845504 +0100
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops -fdisable-tree-cunroll=foo -fdisable-tree-cunrolli=foo -fenable-rtl-loop2_unroll" } */
+/* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops -fno-tree-vrp -fdisable-tree-cunroll=foo -fdisable-tree-cunrolli=foo -fenable-rtl-loop2_unroll" } */
 
 unsigned a[100], b[100];
 inline void bar()
--- gcc/testsuite/gcc.dg/unroll_3.c.jj	2013-08-30 14:38:39.000000000 +0200
+++ gcc/testsuite/gcc.dg/unroll_3.c	2013-11-06 17:10:38.864800338 +0100
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops -fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll=foo" } */
+/* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops -fno-tree-vrp -fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll=foo" } */
 
 unsigned a[100], b[100];
 inline void bar()
--- gcc/testsuite/gcc.dg/unroll_4.c.jj	2013-08-30 14:38:40.000000000 +0200
+++ gcc/testsuite/gcc.dg/unroll_4.c	2013-11-06 17:11:03.816665603 +0100
@@ -1,5 +1,5 @@ 
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops -fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll=foo2" } */
+/* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops -fno-tree-vrp -fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll=foo2" } */
 
 unsigned a[100], b[100];
 inline void bar()