From patchwork Sat Jul 10 22:29:58 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: Fix PR43567 (-ftree-loop-linear) Date: Sat, 10 Jul 2010 12:29:58 -0000 From: Tom de Vries X-Patchwork-Id: 58498 Message-Id: <1278800998.21431.440.camel@linux1> To: gcc-patches@gcc.gnu.org The testcase pr43567-1.c (see patch below) does not return 0 as it should: ... $ gcc -O2 -ftree-loop-linear -fno-tree-ch pr43567-1.c ; ./a.out ; echo $? 1 $ ... This is probably a duplicate of bug 20612. When the patch from bug 20612 in lambda_loopnest_to_gcc_loopnest is reverted, the compiler ICEs in verify_ssa, with a similar message as in bug 20612: ... pr43567-1.c: In function ‘test’: pr43567-1.c:5:6: error: definition in block 3 does not dominate use in block 4 for SSA_NAME: lnivtmp.6_20 in statement: if (lletmp.8_17 >= lnivtmp.6_20) pr43567-1.c:5:6: internal compiler error: verify_ssa failed Please submit a full bug report, with preprocessed source if appropriate. See for instructions. ... When we stop after lambda_loopnest_to_gcc_loopnest, and dump the function test, it looks like this (without the patch): ... : goto ; : D.2742_6 = (long unsigned int) lnivtmp.3_5; D.2743_7 = D.2742_6 * 4; D.2744_9 = a_8(D) + D.2743_7; *D.2744_9 = 0; j_10 = lnivtmp.3_5 + 1; lnivtmp.6_20 = lnivtmp.6_18 + 1; : # j_2 = PHI <0(7), j_10(3)> # lnivtmp.6_18 = PHI <0(7), lnivtmp.6_20(3)> lletmp.8_17 = n_4(D) + -1; if (lletmp.8_17 >= lnivtmp.6_20) goto ; else goto ; : i_11 = lnivtmp.6_18 + 1; lnivtmp.3_19 = lnivtmp.3_5 + 1; : # i_1 = PHI <0(2), i_11(5)> # lnivtmp.3_5 = PHI <0(2), lnivtmp.3_19(5)> lletmp.5_16 = n_4(D) + -1; if (lletmp.5_16 >= lnivtmp.3_19) goto ; else goto ; : goto ; : return; ... The problem is the following: in bb4, in the comparison 'lletmp.8_17 >= lnivtmp.6_20', we use lnivtmp.6_20. This ssavar has its def (the increment 'lnivtmp.6_20 = lnivtmp.6_18 + 1') in bb3, which does not dominate the use in bb4, so this is indeed illegal ssa. What should have been used instead in the comparison, is the local value of the iv: lnivtmp.6_18, defined by ' lnivtmp.6_18 = PHI <0(7), lnivtmp.6_20(3)>'. The patch from bug 20612 in lambda_loopnest_to_gcc_loopnest prevents the ssa_verify failure, but does not generate correct code. Representation at the same point (with that patch): ... : goto ; : D.2742_6 = (long unsigned int) lnivtmp.3_5; D.2743_7 = D.2742_6 * 4; D.2744_9 = a_8(D) + D.2743_7; *D.2744_9 = 0; j_10 = lnivtmp.3_5 + 1; lnivtmp.6_21 = lnivtmp.6_20 + 1; : # j_2 = PHI <0(7), j_10(3)> # lnivtmp.6_20 = PHI <0(7), lnivtmp.6_21(3)> lletmp.8_18 = n_4(D) + -1; lnivtmp.6_22 = lnivtmp.6_20 + 1; if (lletmp.8_18 >= lnivtmp.6_22) goto ; else goto ; : i_11 = lnivtmp.6_20 + 1; lnivtmp.3_19 = lnivtmp.3_5 + 1; : # i_1 = PHI <0(2), i_11(5)> # lnivtmp.3_5 = PHI <0(2), lnivtmp.3_19(5)> lletmp.5_16 = n_4(D) + -1; lnivtmp.3_17 = lnivtmp.3_5 + 1; if (lletmp.5_16 >= lnivtmp.3_17) goto ; else goto ; : goto ; : return; ... The comparison 'lletmp.8_18 >= lnivtmp.6_22' uses lnivtmp.6_22, defined by the extra increment 'lnivtmp.6_22 = lnivtmp.6_20 + 1'. This extra increment however uses lnivtmp.6_20, defined by 'lnivtmp.6_20 = PHI <0(7), lnivtmp.6_21(3)>', which uses lnivtmp.6_21 defined by the original increment 'lnivtmp.6_21 = lnivtmp.6_20 + 1'. So, the ivvar is incremented twice before testing it in the comparison. The code that adds the extra insert is not correct currently, but I can't say whether the original patch was already wrong, or that it was correct but got corrupted in a later checkin (there have been a few). Either way, I tried to fix the problem without using an extra increment. Why doesn't this problem exist for -ftree-ch? Take a look at the representation at the same point (again without the patch): ... : if (n_4(D) > 0) goto ; else goto ; : goto ; : : # j_22 = PHI # lnivtmp.10_12 = PHI lletmp.12_1 = n_4(D) + -1; D.2742_6 = (long unsigned int) lnivtmp.7_20; D.2743_7 = D.2742_6 * 4; D.2744_9 = a_8(D) + D.2743_7; *D.2744_9 = 0; j_10 = lnivtmp.7_20 + 1; lnivtmp.10_2 = lnivtmp.10_12 + 1; if (lletmp.12_1 >= lnivtmp.10_2) goto ; else goto ; : lletmp.9_21 = n_4(D) + -1; i_11 = lnivtmp.10_12 + 1; lnivtmp.7_13 = lnivtmp.7_20 + 1; if (lletmp.9_21 >= lnivtmp.7_13) goto ; else goto ; : : # i_18 = PHI # lnivtmp.7_20 = PHI goto ; : return; ... We see that the increment 'lnivtmp.10_2 = lnivtmp.10_12 + 1' is positioned in front of the comparison 'lletmp.12_1 >= lnivtmp.10_2', and the comparison correctly uses the local value of the iv, which is the result of the increment. The patch reverts the original fix, and fixes the issue by testing for the case that the increment does not dominate the comparison, and then using the appropriate local ivvar. The patch was bootstrapped and regression tested (gcc, objc, gfortran, g++, libgomp, libstdc++, libjava, libmudflap, libffi) on x86_64-unknown-linux-gnu with and without patch, with similar results (apart from execution failure of pr43567-1.c, which fails without patch). Ok for trunk? I don't have copyright assignment (started that process though) or write access. PR tree-optimization/43567 * lambda-code.c (lambda_loopnest_to_gcc_loopnest): Check for the case that the increment does not dominate the comparison, and use the appropriate ivvar. Index: src/gcc/lambda-code.c =================================================================== --- src/gcc/lambda-code.c (revision 162004) +++ src/gcc/lambda-code.c (working copy) @@ -1735,7 +1735,7 @@ lambda_loopnest_to_gcc_loopnest (struct lambda_loop newloop; basic_block bb; edge exit; - tree ivvar, ivvarinced; + tree ivvar, ivvarinced, ivvarcond; gimple exitcond; gimple_seq stmts; enum tree_code testtype; @@ -1743,7 +1743,6 @@ lambda_loopnest_to_gcc_loopnest (struct lambda_linear_expression offset; tree type; bool insert_after; - gimple inc_stmt; oldiv = VEC_index (tree, old_ivs, i); type = TREE_TYPE (oldiv); @@ -1799,20 +1798,9 @@ lambda_loopnest_to_gcc_loopnest (struct create_iv (newlowerbound, build_int_cst (type, LL_STEP (newloop)), ivvar, temp, &bsi, insert_after, &ivvar, - NULL); + &ivvarinced); - /* Unfortunately, the incremented ivvar that create_iv inserted may not - dominate the block containing the exit condition. - So we simply create our own incremented iv to use in the new exit - test, and let redundancy elimination sort it out. */ - inc_stmt = gimple_build_assign_with_ops (PLUS_EXPR, SSA_NAME_VAR (ivvar), - ivvar, - build_int_cst (type, LL_STEP (newloop))); - - ivvarinced = make_ssa_name (SSA_NAME_VAR (ivvar), inc_stmt); - gimple_assign_set_lhs (inc_stmt, ivvarinced); - bsi = gsi_for_stmt (exitcond); - gsi_insert_before (&bsi, inc_stmt, GSI_SAME_STMT); + gcc_assert (gimple_code (SSA_NAME_DEF_STMT (ivvar)) == GIMPLE_PHI); /* Replace the exit condition with the new upper bound comparison. */ @@ -1826,7 +1814,17 @@ lambda_loopnest_to_gcc_loopnest (struct if (exit->flags & EDGE_FALSE_VALUE) testtype = swap_tree_comparison (testtype); - gimple_cond_set_condition (exitcond, testtype, newupperbound, ivvarinced); + if (bb != bsi.bb) + { + /* The incremented ivvar that create_iv inserted does not + dominate the block containing the exit condition. Use + the phi that dominates exitcond instead. */ + ivvarcond = ivvar; + } + else + ivvarcond = ivvarinced; + + gimple_cond_set_condition (exitcond, testtype, newupperbound, ivvarcond); update_stmt (exitcond); VEC_replace (tree, new_ivs, i, ivvar); Index: src/gcc/testsuite/gcc.dg/tree-ssa/pr43567-1.c =================================================================== --- src/gcc/testsuite/gcc.dg/tree-ssa/pr43567-1.c (revision 0) +++ src/gcc/testsuite/gcc.dg/tree-ssa/pr43567-1.c (revision 0) @@ -0,0 +1,28 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -fno-tree-ch -ftree-loop-linear -fdump-tree-ltrans-all" } */ + +__attribute__((noinline)) +void test (int n, int *a) +{ + int i, j; + + for (i = 0; i < n; i++) + for (j = 0; j < n; j++) + a[j] = 0; +} + +int main () +{ + int success; + int a[4] = {11 , 12, 13, 14 }; + test (2, &a[1]); + success = + a[0] == 11 + && a[1] == 0 + && a[2] == 0 + && a[3] == 14; + return !success; +} + +/* { dg-final { scan-tree-dump-times "transformed loop" 1 "ltrans" } } */ +/* { dg-final { cleanup-tree-dump "ltrans" } } */ Index: src/gcc/testsuite/gcc.dg/tree-ssa/pr43567-2.c =================================================================== --- src/gcc/testsuite/gcc.dg/tree-ssa/pr43567-2.c (revision 0) +++ src/gcc/testsuite/gcc.dg/tree-ssa/pr43567-2.c (revision 0) @@ -0,0 +1,28 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -ftree-ch -ftree-loop-linear -fdump-tree-ltrans-all" } */ + +__attribute__((noinline)) +void test (int n, int *a) +{ + int i, j; + + for (i = 0; i < n; i++) + for (j = 0; j < n; j++) + a[j] = 0; +} + +int main () +{ + int success; + int a[4] = {11 , 12, 13, 14 }; + test (2, &a[1]); + success = + a[0] == 11 + && a[1] == 0 + && a[2] == 0 + && a[3] == 14; + return !success; +} + +/* { dg-final { scan-tree-dump-times "transformed loop" 1 "ltrans" } } */ +/* { dg-final { cleanup-tree-dump "ltrans" } } */