Message ID | 5694CF85.1020708@mentor.com |
---|---|
State | New |
Headers | show |
On Tue, 12 Jan 2016, Tom de Vries wrote: > Hi, > > This patch fixes PR69110, a wrong-code bug in autopar. > > > I. > > consider testcase test.c: > ... > #define N 1000 > > unsigned int i = 0; > > static void __attribute__((noinline, noclone)) > foo (void) > { > unsigned int z; > > for (z = 0; z < N; ++z) > ++i; > } > > extern void abort (void); > > int > main (void) > { > foo (); > if (i != N) > abort (); > > return 0; > } > ... > > When compiled with -O1 -ftree-parallelize-loops=2 -fno-tree-loop-im, the test > fails: > ... > $ gcc test.c -O1 -ftree-parallelize-loops=2 -Wl,-rpath=$(pwd > -P)//install/lib64 -fno-tree-loop-im > $ ./a.out > Aborted (core dumped) > $ > ... > > > II. > > Before parloops, at ivcanon we have the loop body: > ... > <bb 3>: > # z_10 = PHI <z_7(4), 0(2)> > # ivtmp_12 = PHI <ivtmp_2(4), 1000(2)> > i.1_4 = i; > _5 = i.1_4 + 1; > i = _5; > z_7 = z_10 + 1; > ivtmp_2 = ivtmp_12 - 1; > if (ivtmp_2 != 0) > goto <bb 4>; > else > goto <bb 5>; > ... > > There's a loop-carried dependency in i, that is, the read from i in iteration > z == 1 depends on the write to i in iteration z == 0. So the loop cannot be > parallelized. The test-case fails because parloops still parallelizes the > loop. > > > III. > > Since the loop carried dependency is in-memory, it is not handled by the code > analyzing reductions, since that code ignores the virtual phi. > > So, AFAIU, this loop carried dependency should be handled by the dependency > testing in loop_parallel_p. And loop_parallel_p returns true for this loop. > > A comment in loop_parallel_p reads: "Check for problems with dependences. If > the loop can be reversed, the iterations are independent." > > AFAIU, the loop order can actually be reversed. But, it cannot be executed in > parallel. > > So from this perspective, it seems in this case the comment matches the check, > but the check is not sufficient. > > > IV. > > OTOH, if we replace the declaration of i with i[1], and replace the references > of i with i[0], we see that loop_parallel_p fails. So the loop_parallel_p > check in this case seems sufficient, and there's something else that causes > the check to fail in this case. > > The difference is in the generated data ref: > - in the 'i[1]' case, we set DR_ACCESS_FNS in dr_analyze_indices to > vector with a single element: access function 0. > - in the 'i' case, we set DR_ACCESS_FNS to NULL. > > This difference causes different handling in the dependency generation, in > particular in add_distance_for_zero_overlaps which has no effect for the 'i' > case because DDR_NUM_SUBSCRIPTS (ddr) == 0 (as a consequence of the NULL > access_fns of both the source and sink data refs). > > From this perspective, it seems that the loop_parallel_p check is sufficient, > and that dr_analyze_indices shouldn't return a NULL access_fns for 'i'. > > > V. > > When compiling with graphite using -floop-parallelize-all --param > graphite-min-loops-per-function=1, we find: > ... > [scop-detection-fail] Graphite cannot handle data-refs in stmt: > # VUSE <.MEM_11> > i.1_4 = i; > ... > > The function scop_detection::stmt_has_simple_data_refs_p returns false because > of the code recently added for PR66980 at r228357: > ... > int nb_subscripts = DR_NUM_DIMENSIONS (dr); > > if (nb_subscripts < 1) > { > free_data_refs (drs); > return false; > } > ... > > [ DR_NUM_DIMENSIONS (dr) is 0 as a consequence of the NULL access_fns. ] > > This code labels DR_NUM_DIMENSIONS (dr) == 0 as 'data reference analysis has > failed'. > > From this perspective, it seems that the dependence handling should bail out > once it finds a data ref with DR_NUM_DIMENSIONS (dr) == 0 (or DR_ACCESS_FNS == > 0). > > > VI. > > This test-case used to pass in 4.6 because in find_data_references_in_stmt we > had: > ... > /* FIXME -- data dependence analysis does not work correctly for > objects with invariant addresses in loop nests. Let us fail > here until the problem is fixed. */ > if (dr_address_invariant_p (dr) && nest) > { > free_data_ref (dr); > if (dump_file && (dump_flags & TDF_DETAILS)) > fprintf (dump_file, > "\tFAILED as dr address is invariant\n"); > ret = false; > break; > } > ... > > That FIXME was removed in the fix for PR46787, at r175704. > > The test-case fails in 4.8, and I guess from there onwards. > > > VII. > > The attached patch fixes the problem by returning a zero access function for > 'i' in dr_analyze_indices. > > [ But I can also imagine a solution similar to the graphite fix: > ... > @@ -3997,6 +3999,12 @@ find_data_references_in_stmt > dr = create_data_ref (nest, loop_containing_stmt (stmt), > ref->ref, stmt, ref->is_read); > gcc_assert (dr != NULL); > + if (DR_NUM_DIMENSIONS (dr) == 0) > + { > + datarefs->release (); > + return false; > + } > + > datarefs->safe_push (dr); > } > references.release (); > ... > > I'm not familiar enough with the dependency analysis code to know where > exactly this should be fixed. ] > > Bootstrapped and reg-tested on x86_64. > > OK for trunk? > > OK for release branches? Bah - attachement [please post patches inline] So without quote ;) Doesnt' the same issue apply to > unsigned int *p; > > static void __attribute__((noinline, noclone)) > foo (void) > { > unsigned int z; > > for (z = 0; z < N; ++z) > ++(*p); > } thus when we have a MEM_REF[p_1]? SCEV will not analyze its evolution to a POLYNOMIAL_CHREC and thus access_fns will be NULL again. I think avoiding a NULL access_fns is ok but it should be done unconditionally, not only for the DECL_P case. Thanks, Richard.
Don't return NULL access_fns in dr_analyze_indices 2016-01-12 Tom de Vries <tom@codesourcery.com> * tree-data-ref.c (dr_analyze_indices): Don't return NULL access_fns. * gcc.dg/autopar/pr69110.c: New test. * testsuite/libgomp.c/pr69110.c: New test. --- gcc/testsuite/gcc.dg/autopar/pr69110.c | 20 ++++++++++++++++++++ gcc/tree-data-ref.c | 3 +++ libgomp/testsuite/libgomp.c/pr69110.c | 27 +++++++++++++++++++++++++++ 3 files changed, 50 insertions(+) diff --git a/gcc/testsuite/gcc.dg/autopar/pr69110.c b/gcc/testsuite/gcc.dg/autopar/pr69110.c new file mode 100644 index 0000000..0eca411 --- /dev/null +++ b/gcc/testsuite/gcc.dg/autopar/pr69110.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-O1 -ftree-parallelize-loops=2 -fno-tree-loop-im -fdump-tree-parloops-details" } */ + +#define N 1000 + +unsigned int i = 0; + +void +foo (void) +{ + unsigned int z; + unsigned int *p = &i; + for (z = 0; z < N; ++z) + ++i; +} + +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 0 "parloops" } } */ +/* { dg-final { scan-tree-dump-times "FAILED: data dependencies exist across iterations" 1 "parloops" } } */ + + diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c index a40f40d..862589b 100644 --- a/gcc/tree-data-ref.c +++ b/gcc/tree-data-ref.c @@ -1021,6 +1021,9 @@ dr_analyze_indices (struct data_reference *dr, loop_p nest, loop_p loop) ref = build2 (MEM_REF, TREE_TYPE (ref), build_fold_addr_expr (ref), build_int_cst (reference_alias_ptr_type (ref), 0)); + + if (access_fns == vNULL) + access_fns.safe_push (integer_zero_node); } DR_BASE_OBJECT (dr) = ref; diff --git a/libgomp/testsuite/libgomp.c/pr69110.c b/libgomp/testsuite/libgomp.c/pr69110.c new file mode 100644 index 0000000..049f014 --- /dev/null +++ b/libgomp/testsuite/libgomp.c/pr69110.c @@ -0,0 +1,27 @@ +/* { dg-do run } */ +/* { dg-options "-ftree-parallelize-loops=2 -O1 -fno-tree-loop-im" } */ + +#define N 1000 + +unsigned int i = 0; + +static void __attribute__((noinline, noclone)) +foo (void) +{ + unsigned int z; + unsigned int *p = &i; + for (z = 0; z < N; ++z) + ++i; +} + +extern void abort (void); + +int +main (void) +{ + foo (); + if (i != N) + abort (); + + return 0; +}