Patchwork Fix PR43567 (-ftree-loop-linear)

login
register
mail settings
Submitter Tom de Vries
Date July 10, 2010, 10:29 p.m.
Message ID <1278800998.21431.440.camel@linux1>
Download mbox | patch
Permalink /patch/58498/
State New
Headers show

Comments

Tom de Vries - July 10, 2010, 10:29 p.m.
The testcase pr43567-1.c (see patch below) does not return 0 as it should:
...
$ gcc -O2 -ftree-loop-linear -fno-tree-ch pr43567-1.c ; ./a.out ; echo $?
1
$
...

This is probably a duplicate of bug 20612. When the patch from bug 20612 in
lambda_loopnest_to_gcc_loopnest is reverted, the compiler ICEs in verify_ssa, 
with a similar message as in bug 20612:
...
pr43567-1.c: In function ‘test’:
pr43567-1.c:5:6: error: definition in block 3 does not dominate use in block 4
for SSA_NAME: lnivtmp.6_20 in statement:
if (lletmp.8_17 >= lnivtmp.6_20)
pr43567-1.c:5:6: internal compiler error: verify_ssa failed
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://gcc.gnu.org/bugs.html> for instructions.
...

When we stop after lambda_loopnest_to_gcc_loopnest, and dump the function test,
it looks like this (without the patch):
...
<bb 2>:
  goto <bb 6>;
<bb 3>:
  D.2742_6 = (long unsigned int) lnivtmp.3_5;
  D.2743_7 = D.2742_6 * 4;
  D.2744_9 = a_8(D) + D.2743_7;
  *D.2744_9 = 0;
  j_10 = lnivtmp.3_5 + 1;
  lnivtmp.6_20 = lnivtmp.6_18 + 1;
<bb 4>:
  # j_2 = PHI <0(7), j_10(3)>
  # lnivtmp.6_18 = PHI <0(7), lnivtmp.6_20(3)>
  lletmp.8_17 = n_4(D) + -1;
  if (lletmp.8_17 >= lnivtmp.6_20)
    goto <bb 3>;
  else
    goto <bb 5>;
<bb 5>:
  i_11 = lnivtmp.6_18 + 1;
  lnivtmp.3_19 = lnivtmp.3_5 + 1;
<bb 6>:
  # i_1 = PHI <0(2), i_11(5)>
  # lnivtmp.3_5 = PHI <0(2), lnivtmp.3_19(5)>
  lletmp.5_16 = n_4(D) + -1;
  if (lletmp.5_16 >= lnivtmp.3_19)
    goto <bb 7>;
  else
    goto <bb 8>;
<bb 7>:
  goto <bb 4>;
<bb 8>:
  return;
...

The problem is the following: in bb4, in the comparison 
'lletmp.8_17 >= lnivtmp.6_20', we use lnivtmp.6_20. This ssavar has its def 
(the increment 'lnivtmp.6_20 = lnivtmp.6_18 + 1') in bb3, which does not 
dominate the use in bb4, so this is indeed illegal ssa.
What should have been used instead in the comparison, is the local value of the 
iv: lnivtmp.6_18, defined by ' lnivtmp.6_18 = PHI <0(7), lnivtmp.6_20(3)>'.

The patch from bug 20612 in lambda_loopnest_to_gcc_loopnest prevents the
ssa_verify failure, but does not generate correct code. Representation at the 
same point (with that patch):
...
<bb 2>:
  goto <bb 6>;
<bb 3>:
  D.2742_6 = (long unsigned int) lnivtmp.3_5;
  D.2743_7 = D.2742_6 * 4;
  D.2744_9 = a_8(D) + D.2743_7;
  *D.2744_9 = 0;
  j_10 = lnivtmp.3_5 + 1;
  lnivtmp.6_21 = lnivtmp.6_20 + 1;
<bb 4>:
  # j_2 = PHI <0(7), j_10(3)>
  # lnivtmp.6_20 = PHI <0(7), lnivtmp.6_21(3)>
  lletmp.8_18 = n_4(D) + -1;
  lnivtmp.6_22 = lnivtmp.6_20 + 1;
  if (lletmp.8_18 >= lnivtmp.6_22)
    goto <bb 3>;
  else
    goto <bb 5>;
<bb 5>:
  i_11 = lnivtmp.6_20 + 1;
  lnivtmp.3_19 = lnivtmp.3_5 + 1;
<bb 6>:
  # i_1 = PHI <0(2), i_11(5)>
  # lnivtmp.3_5 = PHI <0(2), lnivtmp.3_19(5)>
  lletmp.5_16 = n_4(D) + -1;
  lnivtmp.3_17 = lnivtmp.3_5 + 1;
  if (lletmp.5_16 >= lnivtmp.3_17)
    goto <bb 7>;
  else
    goto <bb 8>;
<bb 7>:
  goto <bb 4>;
<bb 8>:
  return;
...

The comparison 'lletmp.8_18 >= lnivtmp.6_22' uses lnivtmp.6_22, defined by the
extra increment 'lnivtmp.6_22 = lnivtmp.6_20 + 1'. This extra increment however
uses lnivtmp.6_20, defined by 'lnivtmp.6_20 = PHI <0(7), lnivtmp.6_21(3)>',
which uses lnivtmp.6_21 defined by the original increment 
'lnivtmp.6_21 = lnivtmp.6_20 + 1'. So, the ivvar is incremented twice before
testing it in the comparison. 

The code that adds the extra insert is not correct currently, but I can't say
whether the original patch was already wrong, or that it was correct but got
corrupted in a later checkin (there have been a few). Either way, I tried
to fix the problem without using an extra increment.

Why doesn't this problem exist for -ftree-ch? Take a look at the representation
at the same point (again without the patch):
...
<bb 2>:
  if (n_4(D) > 0)
    goto <bb 3>;
  else
    goto <bb 9>;
<bb 3>:
  goto <bb 8>;
<bb 4>:
<bb 5>:
  # j_22 = PHI <j_10(4), 0(8)>
  # lnivtmp.10_12 = PHI <lnivtmp.10_2(4), 0(8)>
  lletmp.12_1 = n_4(D) + -1;
  D.2742_6 = (long unsigned int) lnivtmp.7_20;
  D.2743_7 = D.2742_6 * 4;
  D.2744_9 = a_8(D) + D.2743_7;
  *D.2744_9 = 0;
  j_10 = lnivtmp.7_20 + 1;
  lnivtmp.10_2 = lnivtmp.10_12 + 1;
  if (lletmp.12_1 >= lnivtmp.10_2)
    goto <bb 4>;
  else
    goto <bb 6>;
<bb 6>:
  lletmp.9_21 = n_4(D) + -1;
  i_11 = lnivtmp.10_12 + 1;
  lnivtmp.7_13 = lnivtmp.7_20 + 1;
  if (lletmp.9_21 >= lnivtmp.7_13)
    goto <bb 7>;
  else
    goto <bb 9>;
<bb 7>:
<bb 8>:
  # i_18 = PHI <i_11(7), 0(3)>
  # lnivtmp.7_20 = PHI <lnivtmp.7_13(7), 0(3)>
  goto <bb 5>;
<bb 9>:
  return;
...

We see that the increment 'lnivtmp.10_2 = lnivtmp.10_12 + 1' is positioned in
front of the comparison 'lletmp.12_1 >= lnivtmp.10_2', and the comparison 
correctly uses the local value of the iv, which is the result of the increment.

The patch reverts the original fix, and fixes the issue by testing for the case
that the increment does not dominate the comparison, and then using the
appropriate local ivvar.

The patch was bootstrapped and regression tested (gcc, objc, gfortran, g++,
libgomp, libstdc++, libjava, libmudflap, libffi) on x86_64-unknown-linux-gnu
with and without patch, with similar results (apart from execution failure of
pr43567-1.c, which fails without patch).

Ok for trunk? I don't have copyright assignment (started that process though) 
or write access.

	PR tree-optimization/43567
	* lambda-code.c (lambda_loopnest_to_gcc_loopnest): Check for the 
	case that the increment does not dominate the comparison, and use
	the appropriate ivvar.
Sebastian Pop - July 12, 2010, 12:52 a.m.
On Sat, Jul 10, 2010 at 17:29, Tom de Vries <tjvries@xs4all.nl> wrote:
> The testcase pr43567-1.c (see patch below) does not return 0 as it should:
> ...
> $ gcc -O2 -ftree-loop-linear -fno-tree-ch pr43567-1.c ; ./a.out ; echo $?
> 1
> $
> ...
>
> This is probably a duplicate of bug 20612. When the patch from bug 20612 in
> lambda_loopnest_to_gcc_loopnest is reverted, the compiler ICEs in verify_ssa,
> with a similar message as in bug 20612:
> ...
> pr43567-1.c: In function ‘test’:
> pr43567-1.c:5:6: error: definition in block 3 does not dominate use in block 4
> for SSA_NAME: lnivtmp.6_20 in statement:
> if (lletmp.8_17 >= lnivtmp.6_20)
> pr43567-1.c:5:6: internal compiler error: verify_ssa failed
> Please submit a full bug report,
> with preprocessed source if appropriate.
> See <http://gcc.gnu.org/bugs.html> for instructions.
> ...
>
> When we stop after lambda_loopnest_to_gcc_loopnest, and dump the function test,
> it looks like this (without the patch):
> ...
> <bb 2>:
>  goto <bb 6>;
> <bb 3>:
>  D.2742_6 = (long unsigned int) lnivtmp.3_5;
>  D.2743_7 = D.2742_6 * 4;
>  D.2744_9 = a_8(D) + D.2743_7;
>  *D.2744_9 = 0;
>  j_10 = lnivtmp.3_5 + 1;
>  lnivtmp.6_20 = lnivtmp.6_18 + 1;
> <bb 4>:
>  # j_2 = PHI <0(7), j_10(3)>
>  # lnivtmp.6_18 = PHI <0(7), lnivtmp.6_20(3)>
>  lletmp.8_17 = n_4(D) + -1;
>  if (lletmp.8_17 >= lnivtmp.6_20)
>    goto <bb 3>;
>  else
>    goto <bb 5>;
> <bb 5>:
>  i_11 = lnivtmp.6_18 + 1;
>  lnivtmp.3_19 = lnivtmp.3_5 + 1;
> <bb 6>:
>  # i_1 = PHI <0(2), i_11(5)>
>  # lnivtmp.3_5 = PHI <0(2), lnivtmp.3_19(5)>
>  lletmp.5_16 = n_4(D) + -1;
>  if (lletmp.5_16 >= lnivtmp.3_19)
>    goto <bb 7>;
>  else
>    goto <bb 8>;
> <bb 7>:
>  goto <bb 4>;
> <bb 8>:
>  return;
> ...
>
> The problem is the following: in bb4, in the comparison
> 'lletmp.8_17 >= lnivtmp.6_20', we use lnivtmp.6_20. This ssavar has its def
> (the increment 'lnivtmp.6_20 = lnivtmp.6_18 + 1') in bb3, which does not
> dominate the use in bb4, so this is indeed illegal ssa.
> What should have been used instead in the comparison, is the local value of the
> iv: lnivtmp.6_18, defined by ' lnivtmp.6_18 = PHI <0(7), lnivtmp.6_20(3)>'.
>
> The patch from bug 20612 in lambda_loopnest_to_gcc_loopnest prevents the
> ssa_verify failure, but does not generate correct code. Representation at the
> same point (with that patch):
> ...
> <bb 2>:
>  goto <bb 6>;
> <bb 3>:
>  D.2742_6 = (long unsigned int) lnivtmp.3_5;
>  D.2743_7 = D.2742_6 * 4;
>  D.2744_9 = a_8(D) + D.2743_7;
>  *D.2744_9 = 0;
>  j_10 = lnivtmp.3_5 + 1;
>  lnivtmp.6_21 = lnivtmp.6_20 + 1;
> <bb 4>:
>  # j_2 = PHI <0(7), j_10(3)>
>  # lnivtmp.6_20 = PHI <0(7), lnivtmp.6_21(3)>
>  lletmp.8_18 = n_4(D) + -1;
>  lnivtmp.6_22 = lnivtmp.6_20 + 1;
>  if (lletmp.8_18 >= lnivtmp.6_22)
>    goto <bb 3>;
>  else
>    goto <bb 5>;
> <bb 5>:
>  i_11 = lnivtmp.6_20 + 1;
>  lnivtmp.3_19 = lnivtmp.3_5 + 1;
> <bb 6>:
>  # i_1 = PHI <0(2), i_11(5)>
>  # lnivtmp.3_5 = PHI <0(2), lnivtmp.3_19(5)>
>  lletmp.5_16 = n_4(D) + -1;
>  lnivtmp.3_17 = lnivtmp.3_5 + 1;
>  if (lletmp.5_16 >= lnivtmp.3_17)
>    goto <bb 7>;
>  else
>    goto <bb 8>;
> <bb 7>:
>  goto <bb 4>;
> <bb 8>:
>  return;
> ...
>
> The comparison 'lletmp.8_18 >= lnivtmp.6_22' uses lnivtmp.6_22, defined by the
> extra increment 'lnivtmp.6_22 = lnivtmp.6_20 + 1'. This extra increment however
> uses lnivtmp.6_20, defined by 'lnivtmp.6_20 = PHI <0(7), lnivtmp.6_21(3)>',
> which uses lnivtmp.6_21 defined by the original increment
> 'lnivtmp.6_21 = lnivtmp.6_20 + 1'. So, the ivvar is incremented twice before
> testing it in the comparison.
>
> The code that adds the extra insert is not correct currently, but I can't say
> whether the original patch was already wrong, or that it was correct but got
> corrupted in a later checkin (there have been a few). Either way, I tried
> to fix the problem without using an extra increment.
>
> Why doesn't this problem exist for -ftree-ch? Take a look at the representation
> at the same point (again without the patch):
> ...
> <bb 2>:
>  if (n_4(D) > 0)
>    goto <bb 3>;
>  else
>    goto <bb 9>;
> <bb 3>:
>  goto <bb 8>;
> <bb 4>:
> <bb 5>:
>  # j_22 = PHI <j_10(4), 0(8)>
>  # lnivtmp.10_12 = PHI <lnivtmp.10_2(4), 0(8)>
>  lletmp.12_1 = n_4(D) + -1;
>  D.2742_6 = (long unsigned int) lnivtmp.7_20;
>  D.2743_7 = D.2742_6 * 4;
>  D.2744_9 = a_8(D) + D.2743_7;
>  *D.2744_9 = 0;
>  j_10 = lnivtmp.7_20 + 1;
>  lnivtmp.10_2 = lnivtmp.10_12 + 1;
>  if (lletmp.12_1 >= lnivtmp.10_2)
>    goto <bb 4>;
>  else
>    goto <bb 6>;
> <bb 6>:
>  lletmp.9_21 = n_4(D) + -1;
>  i_11 = lnivtmp.10_12 + 1;
>  lnivtmp.7_13 = lnivtmp.7_20 + 1;
>  if (lletmp.9_21 >= lnivtmp.7_13)
>    goto <bb 7>;
>  else
>    goto <bb 9>;
> <bb 7>:
> <bb 8>:
>  # i_18 = PHI <i_11(7), 0(3)>
>  # lnivtmp.7_20 = PHI <lnivtmp.7_13(7), 0(3)>
>  goto <bb 5>;
> <bb 9>:
>  return;
> ...
>
> We see that the increment 'lnivtmp.10_2 = lnivtmp.10_12 + 1' is positioned in
> front of the comparison 'lletmp.12_1 >= lnivtmp.10_2', and the comparison
> correctly uses the local value of the iv, which is the result of the increment.
>
> The patch reverts the original fix, and fixes the issue by testing for the case
> that the increment does not dominate the comparison, and then using the
> appropriate local ivvar.
>
> The patch was bootstrapped and regression tested (gcc, objc, gfortran, g++,
> libgomp, libstdc++, libjava, libmudflap, libffi) on x86_64-unknown-linux-gnu
> with and without patch, with similar results (apart from execution failure of
> pr43567-1.c, which fails without patch).
>
> Ok for trunk? I don't have copyright assignment (started that process though)
> or write access.
>
>        PR tree-optimization/43567
>        * lambda-code.c (lambda_loopnest_to_gcc_loopnest): Check for the
>        case that the increment does not dominate the comparison, and use
>        the appropriate ivvar.

This looks good to me, but I cannot approve this change.
Richi, could you review this patch please?

Thanks,
Sebastian
Richard Guenther - July 12, 2010, 7:51 a.m.
On Mon, Jul 12, 2010 at 2:52 AM, Sebastian Pop <sebpop@gmail.com> wrote:
> On Sat, Jul 10, 2010 at 17:29, Tom de Vries <tjvries@xs4all.nl> wrote:
>> The testcase pr43567-1.c (see patch below) does not return 0 as it should:
>> ...
>> $ gcc -O2 -ftree-loop-linear -fno-tree-ch pr43567-1.c ; ./a.out ; echo $?
>> 1
>> $
>> ...
>>
>> This is probably a duplicate of bug 20612. When the patch from bug 20612 in
>> lambda_loopnest_to_gcc_loopnest is reverted, the compiler ICEs in verify_ssa,
>> with a similar message as in bug 20612:
>> ...
>> pr43567-1.c: In function ‘test’:
>> pr43567-1.c:5:6: error: definition in block 3 does not dominate use in block 4
>> for SSA_NAME: lnivtmp.6_20 in statement:
>> if (lletmp.8_17 >= lnivtmp.6_20)
>> pr43567-1.c:5:6: internal compiler error: verify_ssa failed
>> Please submit a full bug report,
>> with preprocessed source if appropriate.
>> See <http://gcc.gnu.org/bugs.html> for instructions.
>> ...
>>
>> When we stop after lambda_loopnest_to_gcc_loopnest, and dump the function test,
>> it looks like this (without the patch):
>> ...
>> <bb 2>:
>>  goto <bb 6>;
>> <bb 3>:
>>  D.2742_6 = (long unsigned int) lnivtmp.3_5;
>>  D.2743_7 = D.2742_6 * 4;
>>  D.2744_9 = a_8(D) + D.2743_7;
>>  *D.2744_9 = 0;
>>  j_10 = lnivtmp.3_5 + 1;
>>  lnivtmp.6_20 = lnivtmp.6_18 + 1;
>> <bb 4>:
>>  # j_2 = PHI <0(7), j_10(3)>
>>  # lnivtmp.6_18 = PHI <0(7), lnivtmp.6_20(3)>
>>  lletmp.8_17 = n_4(D) + -1;
>>  if (lletmp.8_17 >= lnivtmp.6_20)
>>    goto <bb 3>;
>>  else
>>    goto <bb 5>;
>> <bb 5>:
>>  i_11 = lnivtmp.6_18 + 1;
>>  lnivtmp.3_19 = lnivtmp.3_5 + 1;
>> <bb 6>:
>>  # i_1 = PHI <0(2), i_11(5)>
>>  # lnivtmp.3_5 = PHI <0(2), lnivtmp.3_19(5)>
>>  lletmp.5_16 = n_4(D) + -1;
>>  if (lletmp.5_16 >= lnivtmp.3_19)
>>    goto <bb 7>;
>>  else
>>    goto <bb 8>;
>> <bb 7>:
>>  goto <bb 4>;
>> <bb 8>:
>>  return;
>> ...
>>
>> The problem is the following: in bb4, in the comparison
>> 'lletmp.8_17 >= lnivtmp.6_20', we use lnivtmp.6_20. This ssavar has its def
>> (the increment 'lnivtmp.6_20 = lnivtmp.6_18 + 1') in bb3, which does not
>> dominate the use in bb4, so this is indeed illegal ssa.
>> What should have been used instead in the comparison, is the local value of the
>> iv: lnivtmp.6_18, defined by ' lnivtmp.6_18 = PHI <0(7), lnivtmp.6_20(3)>'.
>>
>> The patch from bug 20612 in lambda_loopnest_to_gcc_loopnest prevents the
>> ssa_verify failure, but does not generate correct code. Representation at the
>> same point (with that patch):
>> ...
>> <bb 2>:
>>  goto <bb 6>;
>> <bb 3>:
>>  D.2742_6 = (long unsigned int) lnivtmp.3_5;
>>  D.2743_7 = D.2742_6 * 4;
>>  D.2744_9 = a_8(D) + D.2743_7;
>>  *D.2744_9 = 0;
>>  j_10 = lnivtmp.3_5 + 1;
>>  lnivtmp.6_21 = lnivtmp.6_20 + 1;
>> <bb 4>:
>>  # j_2 = PHI <0(7), j_10(3)>
>>  # lnivtmp.6_20 = PHI <0(7), lnivtmp.6_21(3)>
>>  lletmp.8_18 = n_4(D) + -1;
>>  lnivtmp.6_22 = lnivtmp.6_20 + 1;
>>  if (lletmp.8_18 >= lnivtmp.6_22)
>>    goto <bb 3>;
>>  else
>>    goto <bb 5>;
>> <bb 5>:
>>  i_11 = lnivtmp.6_20 + 1;
>>  lnivtmp.3_19 = lnivtmp.3_5 + 1;
>> <bb 6>:
>>  # i_1 = PHI <0(2), i_11(5)>
>>  # lnivtmp.3_5 = PHI <0(2), lnivtmp.3_19(5)>
>>  lletmp.5_16 = n_4(D) + -1;
>>  lnivtmp.3_17 = lnivtmp.3_5 + 1;
>>  if (lletmp.5_16 >= lnivtmp.3_17)
>>    goto <bb 7>;
>>  else
>>    goto <bb 8>;
>> <bb 7>:
>>  goto <bb 4>;
>> <bb 8>:
>>  return;
>> ...
>>
>> The comparison 'lletmp.8_18 >= lnivtmp.6_22' uses lnivtmp.6_22, defined by the
>> extra increment 'lnivtmp.6_22 = lnivtmp.6_20 + 1'. This extra increment however
>> uses lnivtmp.6_20, defined by 'lnivtmp.6_20 = PHI <0(7), lnivtmp.6_21(3)>',
>> which uses lnivtmp.6_21 defined by the original increment
>> 'lnivtmp.6_21 = lnivtmp.6_20 + 1'. So, the ivvar is incremented twice before
>> testing it in the comparison.
>>
>> The code that adds the extra insert is not correct currently, but I can't say
>> whether the original patch was already wrong, or that it was correct but got
>> corrupted in a later checkin (there have been a few). Either way, I tried
>> to fix the problem without using an extra increment.
>>
>> Why doesn't this problem exist for -ftree-ch? Take a look at the representation
>> at the same point (again without the patch):
>> ...
>> <bb 2>:
>>  if (n_4(D) > 0)
>>    goto <bb 3>;
>>  else
>>    goto <bb 9>;
>> <bb 3>:
>>  goto <bb 8>;
>> <bb 4>:
>> <bb 5>:
>>  # j_22 = PHI <j_10(4), 0(8)>
>>  # lnivtmp.10_12 = PHI <lnivtmp.10_2(4), 0(8)>
>>  lletmp.12_1 = n_4(D) + -1;
>>  D.2742_6 = (long unsigned int) lnivtmp.7_20;
>>  D.2743_7 = D.2742_6 * 4;
>>  D.2744_9 = a_8(D) + D.2743_7;
>>  *D.2744_9 = 0;
>>  j_10 = lnivtmp.7_20 + 1;
>>  lnivtmp.10_2 = lnivtmp.10_12 + 1;
>>  if (lletmp.12_1 >= lnivtmp.10_2)
>>    goto <bb 4>;
>>  else
>>    goto <bb 6>;
>> <bb 6>:
>>  lletmp.9_21 = n_4(D) + -1;
>>  i_11 = lnivtmp.10_12 + 1;
>>  lnivtmp.7_13 = lnivtmp.7_20 + 1;
>>  if (lletmp.9_21 >= lnivtmp.7_13)
>>    goto <bb 7>;
>>  else
>>    goto <bb 9>;
>> <bb 7>:
>> <bb 8>:
>>  # i_18 = PHI <i_11(7), 0(3)>
>>  # lnivtmp.7_20 = PHI <lnivtmp.7_13(7), 0(3)>
>>  goto <bb 5>;
>> <bb 9>:
>>  return;
>> ...
>>
>> We see that the increment 'lnivtmp.10_2 = lnivtmp.10_12 + 1' is positioned in
>> front of the comparison 'lletmp.12_1 >= lnivtmp.10_2', and the comparison
>> correctly uses the local value of the iv, which is the result of the increment.
>>
>> The patch reverts the original fix, and fixes the issue by testing for the case
>> that the increment does not dominate the comparison, and then using the
>> appropriate local ivvar.
>>
>> The patch was bootstrapped and regression tested (gcc, objc, gfortran, g++,
>> libgomp, libstdc++, libjava, libmudflap, libffi) on x86_64-unknown-linux-gnu
>> with and without patch, with similar results (apart from execution failure of
>> pr43567-1.c, which fails without patch).
>>
>> Ok for trunk? I don't have copyright assignment (started that process though)
>> or write access.
>>
>>        PR tree-optimization/43567
>>        * lambda-code.c (lambda_loopnest_to_gcc_loopnest): Check for the
>>        case that the increment does not dominate the comparison, and use
>>        the appropriate ivvar.
>
> This looks good to me, but I cannot approve this change.
> Richi, could you review this patch please?

The patch doesn't look obviously correct to me.  It seems tailored
to the symptom (the IV increment doesn't dominate the use, so
use some other?!).  The dominance check also looks odd.

But maybe both are because of the wording.  I have no idea about
loop-linear, so I won't review the patch.

Richard.

> Thanks,
> Sebastian
>

Patch

Index: src/gcc/lambda-code.c
===================================================================
--- src/gcc/lambda-code.c	(revision 162004)
+++ src/gcc/lambda-code.c	(working copy)
@@ -1735,7 +1735,7 @@  lambda_loopnest_to_gcc_loopnest (struct 
       lambda_loop newloop;
       basic_block bb;
       edge exit;
-      tree ivvar, ivvarinced;
+      tree ivvar, ivvarinced, ivvarcond;
       gimple exitcond;
       gimple_seq stmts;
       enum tree_code testtype;
@@ -1743,7 +1743,6 @@  lambda_loopnest_to_gcc_loopnest (struct 
       lambda_linear_expression offset;
       tree type;
       bool insert_after;
-      gimple inc_stmt;
 
       oldiv = VEC_index (tree, old_ivs, i);
       type = TREE_TYPE (oldiv);
@@ -1799,20 +1798,9 @@  lambda_loopnest_to_gcc_loopnest (struct 
       create_iv (newlowerbound,
 		 build_int_cst (type, LL_STEP (newloop)),
 		 ivvar, temp, &bsi, insert_after, &ivvar,
-		 NULL);
+		 &ivvarinced);
 
-      /* Unfortunately, the incremented ivvar that create_iv inserted may not
-	 dominate the block containing the exit condition.
-	 So we simply create our own incremented iv to use in the new exit
-	 test,  and let redundancy elimination sort it out.  */
-      inc_stmt = gimple_build_assign_with_ops (PLUS_EXPR, SSA_NAME_VAR (ivvar),
-					       ivvar,
-					       build_int_cst (type, LL_STEP (newloop)));
-
-      ivvarinced = make_ssa_name (SSA_NAME_VAR (ivvar), inc_stmt);
-      gimple_assign_set_lhs (inc_stmt, ivvarinced);
-      bsi = gsi_for_stmt (exitcond);
-      gsi_insert_before (&bsi, inc_stmt, GSI_SAME_STMT);
+      gcc_assert (gimple_code (SSA_NAME_DEF_STMT (ivvar)) == GIMPLE_PHI);
 
       /* Replace the exit condition with the new upper bound
          comparison.  */
@@ -1826,7 +1814,17 @@  lambda_loopnest_to_gcc_loopnest (struct 
       if (exit->flags & EDGE_FALSE_VALUE)
 	testtype = swap_tree_comparison (testtype);
 
-      gimple_cond_set_condition (exitcond, testtype, newupperbound, ivvarinced);
+      if (bb != bsi.bb)
+	{
+	  /* The incremented ivvar that create_iv inserted does not
+	     dominate the block containing the exit condition. Use
+	     the phi that dominates exitcond instead.  */
+	  ivvarcond = ivvar;
+	}
+      else
+	ivvarcond = ivvarinced;
+
+      gimple_cond_set_condition (exitcond, testtype, newupperbound, ivvarcond);
       update_stmt (exitcond);
       VEC_replace (tree, new_ivs, i, ivvar);
 
Index: src/gcc/testsuite/gcc.dg/tree-ssa/pr43567-1.c
===================================================================
--- src/gcc/testsuite/gcc.dg/tree-ssa/pr43567-1.c	(revision 0)
+++ src/gcc/testsuite/gcc.dg/tree-ssa/pr43567-1.c	(revision 0)
@@ -0,0 +1,28 @@ 
+/* { dg-do run } */ 
+/* { dg-options "-O2 -fno-tree-ch -ftree-loop-linear -fdump-tree-ltrans-all" } */
+
+__attribute__((noinline))
+void test (int n, int *a)
+{
+  int i, j;
+
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      a[j] = 0;
+}
+
+int main ()
+{
+  int success;
+  int a[4] = {11 , 12, 13, 14 };
+  test (2, &a[1]);
+  success =
+    a[0] == 11
+    && a[1] == 0
+    && a[2] == 0
+    && a[3] == 14;
+  return !success;
+}
+
+/* { dg-final { scan-tree-dump-times "transformed loop" 1 "ltrans" } } */ 
+/* { dg-final { cleanup-tree-dump "ltrans" } } */
Index: src/gcc/testsuite/gcc.dg/tree-ssa/pr43567-2.c
===================================================================
--- src/gcc/testsuite/gcc.dg/tree-ssa/pr43567-2.c	(revision 0)
+++ src/gcc/testsuite/gcc.dg/tree-ssa/pr43567-2.c	(revision 0)
@@ -0,0 +1,28 @@ 
+/* { dg-do run } */ 
+/* { dg-options "-O2 -ftree-ch -ftree-loop-linear -fdump-tree-ltrans-all" } */
+
+__attribute__((noinline))
+void test (int n, int *a)
+{
+  int i, j;
+
+  for (i = 0; i < n; i++)
+    for (j = 0; j < n; j++)
+      a[j] = 0;
+}
+
+int main ()
+{
+  int success;
+  int a[4] = {11 , 12, 13, 14 };
+  test (2, &a[1]);
+  success =
+    a[0] == 11
+    && a[1] == 0
+    && a[2] == 0
+    && a[3] == 14;
+  return !success;
+}
+
+/* { dg-final { scan-tree-dump-times "transformed loop" 1 "ltrans" } } */ 
+/* { dg-final { cleanup-tree-dump "ltrans" } } */