diff mbox

[PR69110] Don't return NULL access_fns in dr_analyze_indices

Message ID 5694CF85.1020708@mentor.com
State New
Headers show

Commit Message

Tom de Vries Jan. 12, 2016, 10:03 a.m. UTC
Hi,

This patch fixes PR69110, a wrong-code bug in autopar.


I.

consider testcase test.c:
...
#define N 1000

unsigned int i = 0;

static void __attribute__((noinline, noclone))
foo (void)
{
   unsigned int z;

   for (z = 0; z < N; ++z)
     ++i;
}

extern void abort (void);

int
main (void)
{
   foo ();
   if (i != N)
     abort ();

   return 0;
}
...

When compiled with -O1 -ftree-parallelize-loops=2 -fno-tree-loop-im, the 
test fails:
...
$ gcc test.c -O1 -ftree-parallelize-loops=2 -Wl,-rpath=$(pwd 
-P)//install/lib64 -fno-tree-loop-im
$ ./a.out
Aborted (core dumped)
$
...


II.

Before parloops, at ivcanon we have the loop body:
...
   <bb 3>:
   # z_10 = PHI <z_7(4), 0(2)>
   # ivtmp_12 = PHI <ivtmp_2(4), 1000(2)>
   i.1_4 = i;
   _5 = i.1_4 + 1;
   i = _5;
   z_7 = z_10 + 1;
   ivtmp_2 = ivtmp_12 - 1;
   if (ivtmp_2 != 0)
     goto <bb 4>;
   else
     goto <bb 5>;
...

There's a loop-carried dependency in i, that is, the read from i in 
iteration z == 1 depends on the write to i in iteration z == 0. So the 
loop cannot be parallelized. The test-case fails because parloops still 
parallelizes the loop.


III.

Since the loop carried dependency is in-memory, it is not handled by the 
code analyzing reductions, since that code ignores the virtual phi.

So, AFAIU, this loop carried dependency should be handled by the 
dependency testing in loop_parallel_p. And loop_parallel_p returns true 
for this loop.

A comment in loop_parallel_p reads: "Check for problems with 
dependences.  If the loop can be reversed, the iterations are independent."

AFAIU, the loop order can actually be reversed. But, it cannot be 
executed in parallel.

So from this perspective, it seems in this case the comment matches the 
check, but the check is not sufficient.


IV.

OTOH, if we replace the declaration of i with i[1], and replace the 
references of i with i[0], we see that loop_parallel_p fails.  So the 
loop_parallel_p check in this case seems sufficient, and there's 
something else that causes the check to fail in this case.

The difference is in the generated data ref:
- in the 'i[1]' case, we set DR_ACCESS_FNS in dr_analyze_indices to
   vector with a single element: access function 0.
- in the 'i' case, we set DR_ACCESS_FNS to NULL.

This difference causes different handling in the dependency generation, 
in particular in add_distance_for_zero_overlaps which has no effect for 
the 'i' case because  DDR_NUM_SUBSCRIPTS (ddr) == 0 (as a consequence of 
the NULL access_fns of both the source and sink data refs).

 From this perspective, it seems that the loop_parallel_p check is 
sufficient, and that dr_analyze_indices shouldn't return a NULL 
access_fns for 'i'.


V.

When compiling with graphite using -floop-parallelize-all --param 
graphite-min-loops-per-function=1, we find:
...
[scop-detection-fail] Graphite cannot handle data-refs in stmt:
# VUSE <.MEM_11>
i.1_4 = i;
...

The function scop_detection::stmt_has_simple_data_refs_p returns false 
because of the code recently added for PR66980 at r228357:
...
       int nb_subscripts = DR_NUM_DIMENSIONS (dr);

       if (nb_subscripts < 1)
	{
           free_data_refs (drs);
           return false;
         }
...

[ DR_NUM_DIMENSIONS (dr) is 0 as a consequence of the NULL access_fns. ]

This code labels DR_NUM_DIMENSIONS (dr) == 0 as 'data reference analysis 
has failed'.

 From this perspective, it seems that the dependence handling should 
bail out once it finds a data ref with DR_NUM_DIMENSIONS (dr) == 0 (or 
DR_ACCESS_FNS == 0).


VI.

This test-case used to pass in 4.6 because in 
find_data_references_in_stmt we had:
...
       /* FIXME -- data dependence analysis does not work correctly for
          objects with invariant addresses in loop nests.  Let us fail
          here until the problem is fixed.  */
       if (dr_address_invariant_p (dr) && nest)
	{
           free_data_ref (dr);
           if (dump_file && (dump_flags & TDF_DETAILS))
             fprintf (dump_file,
                      "\tFAILED as dr address is invariant\n");
           ret = false;
           break;
	}
...

That FIXME was removed in the fix for PR46787, at r175704.

The test-case fails in 4.8, and I guess from there onwards.


VII.

The attached patch fixes the problem by returning a zero access function 
for 'i' in dr_analyze_indices.

[ But I can also imagine a solution similar to the graphite fix:
...
@@ -3997,6 +3999,12 @@ find_data_references_in_stmt
        dr = create_data_ref (nest, loop_containing_stmt (stmt),
                             ref->ref, stmt, ref->is_read);
        gcc_assert (dr != NULL);
+      if (DR_NUM_DIMENSIONS (dr) == 0)
+       {
+         datarefs->release ();
+         return false;
+       }
+
        datarefs->safe_push (dr);
      }
    references.release ();
...

I'm not familiar enough with the dependency analysis code to know where 
exactly this should be fixed. ]

Bootstrapped and reg-tested on x86_64.

OK for trunk?

OK for release branches?

Thanks,
- Tom

Comments

Richard Biener Jan. 12, 2016, 11:22 a.m. UTC | #1
On Tue, 12 Jan 2016, Tom de Vries wrote:

> Hi,
> 
> This patch fixes PR69110, a wrong-code bug in autopar.
> 
> 
> I.
> 
> consider testcase test.c:
> ...
> #define N 1000
> 
> unsigned int i = 0;
> 
> static void __attribute__((noinline, noclone))
> foo (void)
> {
>   unsigned int z;
> 
>   for (z = 0; z < N; ++z)
>     ++i;
> }
> 
> extern void abort (void);
> 
> int
> main (void)
> {
>   foo ();
>   if (i != N)
>     abort ();
> 
>   return 0;
> }
> ...
> 
> When compiled with -O1 -ftree-parallelize-loops=2 -fno-tree-loop-im, the test
> fails:
> ...
> $ gcc test.c -O1 -ftree-parallelize-loops=2 -Wl,-rpath=$(pwd
> -P)//install/lib64 -fno-tree-loop-im
> $ ./a.out
> Aborted (core dumped)
> $
> ...
> 
> 
> II.
> 
> Before parloops, at ivcanon we have the loop body:
> ...
>   <bb 3>:
>   # z_10 = PHI <z_7(4), 0(2)>
>   # ivtmp_12 = PHI <ivtmp_2(4), 1000(2)>
>   i.1_4 = i;
>   _5 = i.1_4 + 1;
>   i = _5;
>   z_7 = z_10 + 1;
>   ivtmp_2 = ivtmp_12 - 1;
>   if (ivtmp_2 != 0)
>     goto <bb 4>;
>   else
>     goto <bb 5>;
> ...
> 
> There's a loop-carried dependency in i, that is, the read from i in iteration
> z == 1 depends on the write to i in iteration z == 0. So the loop cannot be
> parallelized. The test-case fails because parloops still parallelizes the
> loop.
> 
> 
> III.
> 
> Since the loop carried dependency is in-memory, it is not handled by the code
> analyzing reductions, since that code ignores the virtual phi.
> 
> So, AFAIU, this loop carried dependency should be handled by the dependency
> testing in loop_parallel_p. And loop_parallel_p returns true for this loop.
> 
> A comment in loop_parallel_p reads: "Check for problems with dependences.  If
> the loop can be reversed, the iterations are independent."
> 
> AFAIU, the loop order can actually be reversed. But, it cannot be executed in
> parallel.
> 
> So from this perspective, it seems in this case the comment matches the check,
> but the check is not sufficient.
> 
> 
> IV.
> 
> OTOH, if we replace the declaration of i with i[1], and replace the references
> of i with i[0], we see that loop_parallel_p fails.  So the loop_parallel_p
> check in this case seems sufficient, and there's something else that causes
> the check to fail in this case.
> 
> The difference is in the generated data ref:
> - in the 'i[1]' case, we set DR_ACCESS_FNS in dr_analyze_indices to
>   vector with a single element: access function 0.
> - in the 'i' case, we set DR_ACCESS_FNS to NULL.
> 
> This difference causes different handling in the dependency generation, in
> particular in add_distance_for_zero_overlaps which has no effect for the 'i'
> case because  DDR_NUM_SUBSCRIPTS (ddr) == 0 (as a consequence of the NULL
> access_fns of both the source and sink data refs).
> 
> From this perspective, it seems that the loop_parallel_p check is sufficient,
> and that dr_analyze_indices shouldn't return a NULL access_fns for 'i'.
> 
> 
> V.
> 
> When compiling with graphite using -floop-parallelize-all --param
> graphite-min-loops-per-function=1, we find:
> ...
> [scop-detection-fail] Graphite cannot handle data-refs in stmt:
> # VUSE <.MEM_11>
> i.1_4 = i;
> ...
> 
> The function scop_detection::stmt_has_simple_data_refs_p returns false because
> of the code recently added for PR66980 at r228357:
> ...
>       int nb_subscripts = DR_NUM_DIMENSIONS (dr);
> 
>       if (nb_subscripts < 1)
> 	{
>           free_data_refs (drs);
>           return false;
>         }
> ...
> 
> [ DR_NUM_DIMENSIONS (dr) is 0 as a consequence of the NULL access_fns. ]
> 
> This code labels DR_NUM_DIMENSIONS (dr) == 0 as 'data reference analysis has
> failed'.
> 
> From this perspective, it seems that the dependence handling should bail out
> once it finds a data ref with DR_NUM_DIMENSIONS (dr) == 0 (or DR_ACCESS_FNS ==
> 0).
> 
> 
> VI.
> 
> This test-case used to pass in 4.6 because in find_data_references_in_stmt we
> had:
> ...
>       /* FIXME -- data dependence analysis does not work correctly for
>          objects with invariant addresses in loop nests.  Let us fail
>          here until the problem is fixed.  */
>       if (dr_address_invariant_p (dr) && nest)
> 	{
>           free_data_ref (dr);
>           if (dump_file && (dump_flags & TDF_DETAILS))
>             fprintf (dump_file,
>                      "\tFAILED as dr address is invariant\n");
>           ret = false;
>           break;
> 	}
> ...
> 
> That FIXME was removed in the fix for PR46787, at r175704.
> 
> The test-case fails in 4.8, and I guess from there onwards.
> 
> 
> VII.
> 
> The attached patch fixes the problem by returning a zero access function for
> 'i' in dr_analyze_indices.
> 
> [ But I can also imagine a solution similar to the graphite fix:
> ...
> @@ -3997,6 +3999,12 @@ find_data_references_in_stmt
>        dr = create_data_ref (nest, loop_containing_stmt (stmt),
>                             ref->ref, stmt, ref->is_read);
>        gcc_assert (dr != NULL);
> +      if (DR_NUM_DIMENSIONS (dr) == 0)
> +       {
> +         datarefs->release ();
> +         return false;
> +       }
> +
>        datarefs->safe_push (dr);
>      }
>    references.release ();
> ...
> 
> I'm not familiar enough with the dependency analysis code to know where
> exactly this should be fixed. ]
> 
> Bootstrapped and reg-tested on x86_64.
> 
> OK for trunk?
> 
> OK for release branches?

Bah - attachement [please post patches inline]

So without quote ;)

Doesnt' the same issue apply to 

> unsigned int *p;
>       
> static void __attribute__((noinline, noclone))
> foo (void)
> {         
>   unsigned int z;
>   
>   for (z = 0; z < N; ++z)
>     ++(*p);
> }

thus when we have a MEM_REF[p_1]?  SCEV will not analyze
its evolution to a POLYNOMIAL_CHREC and thus access_fns will
be NULL again.

I think avoiding a NULL access_fns is ok but it should be done
unconditionally, not only for the DECL_P case.

Thanks,
Richard.
diff mbox

Patch

Don't return NULL access_fns in dr_analyze_indices

2016-01-12  Tom de Vries  <tom@codesourcery.com>

	* tree-data-ref.c (dr_analyze_indices): Don't return NULL access_fns.

	* gcc.dg/autopar/pr69110.c: New test.

	* testsuite/libgomp.c/pr69110.c: New test.

---
 gcc/testsuite/gcc.dg/autopar/pr69110.c | 20 ++++++++++++++++++++
 gcc/tree-data-ref.c                    |  3 +++
 libgomp/testsuite/libgomp.c/pr69110.c  | 27 +++++++++++++++++++++++++++
 3 files changed, 50 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/autopar/pr69110.c b/gcc/testsuite/gcc.dg/autopar/pr69110.c
new file mode 100644
index 0000000..0eca411
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/autopar/pr69110.c
@@ -0,0 +1,20 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O1 -ftree-parallelize-loops=2 -fno-tree-loop-im -fdump-tree-parloops-details" } */
+
+#define N 1000
+
+unsigned int i = 0;
+
+void
+foo (void)
+{
+  unsigned int z;
+  unsigned int *p = &i;
+  for (z = 0; z < N; ++z)
+    ++i;
+}
+
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 0 "parloops" } } */
+/* { dg-final { scan-tree-dump-times "FAILED: data dependencies exist across iterations" 1 "parloops" } } */
+
+
diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index a40f40d..862589b 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -1021,6 +1021,9 @@  dr_analyze_indices (struct data_reference *dr, loop_p nest, loop_p loop)
       ref = build2 (MEM_REF, TREE_TYPE (ref),
 		    build_fold_addr_expr (ref),
 		    build_int_cst (reference_alias_ptr_type (ref), 0));
+
+      if (access_fns == vNULL)
+	access_fns.safe_push (integer_zero_node);
     }
 
   DR_BASE_OBJECT (dr) = ref;
diff --git a/libgomp/testsuite/libgomp.c/pr69110.c b/libgomp/testsuite/libgomp.c/pr69110.c
new file mode 100644
index 0000000..049f014
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/pr69110.c
@@ -0,0 +1,27 @@ 
+/* { dg-do run } */
+/* { dg-options "-ftree-parallelize-loops=2 -O1 -fno-tree-loop-im" } */
+
+#define N 1000
+
+unsigned int i = 0;
+
+static void __attribute__((noinline, noclone))
+foo (void)
+{
+  unsigned int z;
+  unsigned int *p = &i;
+  for (z = 0; z < N; ++z)
+    ++i;
+}
+
+extern void abort (void);
+
+int
+main (void)
+{
+  foo ();
+  if (i != N)
+    abort ();
+
+  return 0;
+}