[gomp4.1] fold ordered depend(sink) clauses
diff mbox

Message ID 55B96647.20805@redhat.com
State New
Headers show

Commit Message

Aldy Hernandez July 29, 2015, 11:48 p.m. UTC
The attached patch canonicalizes sink dependence clauses into one folded 
clause if possible (as discussed in the paper "Expressing DOACROSS Loop 
Dependences in OpenMP").

The basic algorithm is to create a sink vector whose first element is 
the GCD of all the first elements, and whose remaining elements are the 
minimum of the subsequent columns.  Further explanations are included in 
the code.

I have also added further warnings/errors for incompatible and 
nonsensical sink offsets.

I suggest you start with the tests and first see if you agree with the 
folded cases.

How does this look?
commit 3f0a9dc6ceca690a77c74549a42040c52bc02fdc
Author: Aldy Hernandez <aldyh@redhat.com>
Date:   Wed Jul 29 13:39:06 2015 -0700

    	* wide-int.h (wi::gcd): New.
    	* gimplify.c (struct gimplify_omp_ctx): Rename iter_vars to
    	loop_iter_var.
    	Add loop_dir and loop_const_step fields.
    	(delete_omp_context): Free loop_dir and loop_const_step.
    	(gimplify_omp_for): Set loop_dir and loop_const_step.
    	(gimplify_expr): Move code handling OMP_ORDERED into...
    	(gimplify_omp_ordered): ...here.  New.

Comments

Jakub Jelinek July 30, 2015, 10:01 a.m. UTC | #1
On Wed, Jul 29, 2015 at 04:48:23PM -0700, Aldy Hernandez wrote:
> @@ -7490,8 +7503,12 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
>  	      == TREE_VEC_LENGTH (OMP_FOR_COND (for_stmt)));
>    gcc_assert (TREE_VEC_LENGTH (OMP_FOR_INIT (for_stmt))
>  	      == TREE_VEC_LENGTH (OMP_FOR_INCR (for_stmt)));
> -  gimplify_omp_ctxp->iter_vars.create (TREE_VEC_LENGTH
> -				       (OMP_FOR_INIT (for_stmt)));
> +  gimplify_omp_ctxp->loop_iter_var.create (TREE_VEC_LENGTH
> +					   (OMP_FOR_INIT (for_stmt)));
> +  gimplify_omp_ctxp->loop_dir.create (TREE_VEC_LENGTH
> +				      (OMP_FOR_INIT (for_stmt)));
> +  gimplify_omp_ctxp->loop_const_step.create (TREE_VEC_LENGTH
> +					     (OMP_FOR_INIT (for_stmt)));
>    for (i = 0; i < TREE_VEC_LENGTH (OMP_FOR_INIT (for_stmt)); i++)
>      {
>        t = TREE_VEC_ELT (OMP_FOR_INIT (for_stmt), i);

I think the above should be guarded with
  tree c = find_omp_clause (OMP_FOR_CLAUSES (for_stmt), OMP_CLAUSE_ORDERED);
  if (c && OMP_CLAUSE_ORDERED_EXPR (c))
The most common case is that ordered(n) is not present, so we should
optimize for that.

> @@ -7501,10 +7518,10 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
>        gcc_assert (INTEGRAL_TYPE_P (TREE_TYPE (decl))
>  		  || POINTER_TYPE_P (TREE_TYPE (decl)));
>        if (TREE_CODE (for_stmt) == OMP_FOR && OMP_FOR_ORIG_DECLS (for_stmt))
> -	gimplify_omp_ctxp->iter_vars.quick_push
> +	gimplify_omp_ctxp->loop_iter_var.quick_push
>  	  (TREE_VEC_ELT (OMP_FOR_ORIG_DECLS (for_stmt), i));
>        else
> -	gimplify_omp_ctxp->iter_vars.quick_push (decl);
> +	gimplify_omp_ctxp->loop_iter_var.quick_push (decl);
>  
>        /* Make sure the iteration variable is private.  */
>        tree c = NULL_TREE;

And all these etc. pushes too, simply remember is_doacross in some bool
variable.

> @@ -8387,6 +8435,228 @@ gimplify_transaction (tree *expr_p, gimple_seq *pre_p)
>    return GS_ALL_DONE;
>  }
>  

Function comment missing.

> +static gimple
> +gimplify_omp_ordered (tree expr, gimple_seq body)
> +{

> +     The basic algorithm is to create a sink vector whose first
> +     element is the GCD of all the first elements, and whose remaining
> +     elements are the minimum of the subsequent columns.
> +
> +     We ignore dependence vectors whose first element is zero because
> +     such dependencies are known to be executed by the same thread.
> +
> +     ?? ^^^^ Is this only applicable for collapse(1) loops?  If so, how
> +     ?? to handle collapse(N) loops where N > 1?

For collapse(N) N > 1, you can't ignore first iter var with 0 offset, you can only
ignore if N first iter vars have 0 offset.

Pretty much for the purpose of the algorithm, you compute "first element"
for collapse(N) N > 1 and ordered(5-N)
	for (iv1 = ..; iv1 < ..; iv1 += step1)
	  for (iv2 = ..; iv2 < ..; iv2 += step2)
	    for (iv3 = ..; iv3 < ..; iv3 += step3)
	      for (iv4 = ..; iv4 < ..; iv4 += step4)
		depend(iv1 + off1, iv2 + off2, iv3 + off3, iv4 + off4)
as: off(N) + off(N-1)*step(N) + iv(N-2)*step(N)*step(N-1)...
(to be checked the case if the directions differ).
So basically, you want to precompute if you'd add some loop counter
in between loop(N) and loop(N+1) that would be initially zero and incremented
by step(N).  The GCD is then performed on this, it is compared to zero,
and then finally split again into the several offsets.
The "is this invalid iteration" check (is offN divisible by stepN)
needs to be done before the merging of course.

Except, now that I think, it is not that easy.  Because we have another
test for "is this invalid iteration", in particularly one performed at
runtime on each of the depends.  Say if offN is negative and loop(N)
is forward loop (thus stepN is positive (note, for backward loop stepN
still might be huge positive value for unsigned iterators)), the check
would be if (ivN + offN >= initialN), for forward loop with positive offN
if (ivN + offN < endvalueN) etc.  Now, not sure if by computing the GCD and
merging several depend clauses into one we preserve those tests or not.

All in all, I think it might be really better to do the depend clause
merging during omp lowering in omp-low.c, where you can call
extract_omp_for_data and inspect the iteration variables, have the steps
computed for you etc.  That is the spot where we probably want to emit some
GOMP_depend_sink call (and GOMP_depend_source) and prepare arguments for it.

> +void
> +funk ()
> +{
> +#pragma omp parallel for ordered(2)
> +  for (i=0; i < N; i++)
> +    for (j=0; j < N; ++j)
> +    {
> +/* We should keep the (sink:i,j-2) by virtue of it the i+0.  The
> +   remaining clauses get folded with a GCD of -2 for `i' and a minimum
> +   of -1 for 'j'.  */

I think we shouldn't keep the useless one (sink: i, j-2) (or keep invalid
ones).  Of course, if we remove all depend clauses, we need to remove the
GIMPLE_OMP_ORDERED stmt altogether.

	Jakub

Patch
diff mbox

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 2331001..5262233 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -153,7 +153,18 @@  struct gimplify_omp_ctx
   splay_tree variables;
   hash_set<tree> *privatized_types;
   /* Iteration variables in an OMP_FOR.  */
-  vec<tree> iter_vars;
+  vec<tree> loop_iter_var;
+
+  /* Direction of loop in an OMP_FOR.  */
+  enum dir {
+    DIR_UNKNOWN,
+    DIR_FORWARD,
+    DIR_BACKWARD
+  };
+  vec<dir> loop_dir;
+
+  /* Absolute value of step.  NULL_TREE if non-constant.  */
+  vec<tree> loop_const_step;
   location_t location;
   enum omp_clause_default_kind default_kind;
   enum omp_region_type region_type;
@@ -392,7 +403,9 @@  delete_omp_context (struct gimplify_omp_ctx *c)
 {
   splay_tree_delete (c->variables);
   delete c->privatized_types;
-  c->iter_vars.release ();
+  c->loop_iter_var.release ();
+  c->loop_dir.release ();
+  c->loop_const_step.release ();
   XDELETE (c);
 }
 
@@ -7490,8 +7503,12 @@  gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
 	      == TREE_VEC_LENGTH (OMP_FOR_COND (for_stmt)));
   gcc_assert (TREE_VEC_LENGTH (OMP_FOR_INIT (for_stmt))
 	      == TREE_VEC_LENGTH (OMP_FOR_INCR (for_stmt)));
-  gimplify_omp_ctxp->iter_vars.create (TREE_VEC_LENGTH
-				       (OMP_FOR_INIT (for_stmt)));
+  gimplify_omp_ctxp->loop_iter_var.create (TREE_VEC_LENGTH
+					   (OMP_FOR_INIT (for_stmt)));
+  gimplify_omp_ctxp->loop_dir.create (TREE_VEC_LENGTH
+				      (OMP_FOR_INIT (for_stmt)));
+  gimplify_omp_ctxp->loop_const_step.create (TREE_VEC_LENGTH
+					     (OMP_FOR_INIT (for_stmt)));
   for (i = 0; i < TREE_VEC_LENGTH (OMP_FOR_INIT (for_stmt)); i++)
     {
       t = TREE_VEC_ELT (OMP_FOR_INIT (for_stmt), i);
@@ -7501,10 +7518,10 @@  gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
       gcc_assert (INTEGRAL_TYPE_P (TREE_TYPE (decl))
 		  || POINTER_TYPE_P (TREE_TYPE (decl)));
       if (TREE_CODE (for_stmt) == OMP_FOR && OMP_FOR_ORIG_DECLS (for_stmt))
-	gimplify_omp_ctxp->iter_vars.quick_push
+	gimplify_omp_ctxp->loop_iter_var.quick_push
 	  (TREE_VEC_ELT (OMP_FOR_ORIG_DECLS (for_stmt), i));
       else
-	gimplify_omp_ctxp->iter_vars.quick_push (decl);
+	gimplify_omp_ctxp->loop_iter_var.quick_push (decl);
 
       /* Make sure the iteration variable is private.  */
       tree c = NULL_TREE;
@@ -7670,6 +7687,23 @@  gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
       t = TREE_VEC_ELT (OMP_FOR_COND (for_stmt), i);
       gcc_assert (COMPARISON_CLASS_P (t));
       gcc_assert (TREE_OPERAND (t, 0) == decl);
+      switch (TREE_CODE (t))
+	{
+	case LT_EXPR:
+	case LE_EXPR:
+	  gimplify_omp_ctxp->loop_dir.quick_push
+	    (gimplify_omp_ctx::DIR_FORWARD);
+	  break;
+	case GT_EXPR:
+	case GE_EXPR:
+	  gimplify_omp_ctxp->loop_dir.quick_push
+	    (gimplify_omp_ctx::DIR_BACKWARD);
+	  break;
+	default:
+	  gimplify_omp_ctxp->loop_dir.quick_push
+	    (gimplify_omp_ctx::DIR_UNKNOWN);
+	  break;
+	}
 
       tret = gimplify_expr (&TREE_OPERAND (t, 1), &for_pre_body, NULL,
 			    is_gimple_val, fb_rvalue);
@@ -7687,6 +7721,11 @@  gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
 	       called to massage things appropriately.  */
 	    gcc_assert (!POINTER_TYPE_P (TREE_TYPE (decl)));
 
+	    /* Pointer increments have been adjusted by now, so p++
+	       should be p += SIZE, and handled by the MODIFY_EXPR
+	       case below.  */
+	    gimplify_omp_ctxp->loop_const_step.quick_push (integer_one_node);
+
 	    if (orig_for_stmt != for_stmt)
 	      break;
 	    t = build_int_cst (TREE_TYPE (decl), 1);
@@ -7703,6 +7742,7 @@  gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
 	  /* c_omp_for_incr_canonicalize_ptr() should have been
 	     called to massage things appropriately.  */
 	  gcc_assert (!POINTER_TYPE_P (TREE_TYPE (decl)));
+	  gimplify_omp_ctxp->loop_const_step.quick_push (integer_one_node);
 	  if (orig_for_stmt != for_stmt)
 	    break;
 	  t = build_int_cst (TREE_TYPE (decl), -1);
@@ -7738,6 +7778,14 @@  gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
 	      gcc_unreachable ();
 	    }
 
+	  /* We only care about steps that can be determined to be
+	     constant at compile time.  */
+	  if (TREE_CODE (TREE_OPERAND (t, 1)) == INTEGER_CST)
+	    gimplify_omp_ctxp->loop_const_step.quick_push
+	      (TREE_OPERAND (t, 1));
+	  else
+	    gimplify_omp_ctxp->loop_const_step.quick_push (NULL_TREE);
+
 	  tret = gimplify_expr (&TREE_OPERAND (t, 1), &for_pre_body, NULL,
 				is_gimple_val, fb_rvalue);
 	  ret = MIN (ret, tret);
@@ -8387,6 +8435,228 @@  gimplify_transaction (tree *expr_p, gimple_seq *pre_p)
   return GS_ALL_DONE;
 }
 
+static gimple
+gimplify_omp_ordered (tree expr, gimple_seq body)
+{
+  tree c, decls;
+  int failures = 0;
+  unsigned int i;
+
+  if (gimplify_omp_ctxp)
+    for (c = OMP_ORDERED_CLAUSES (expr); c; c = OMP_CLAUSE_CHAIN (c))
+      if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_DEPEND
+	  && OMP_CLAUSE_DEPEND_KIND (c) == OMP_CLAUSE_DEPEND_SINK)
+	{
+	  bool fail = false;
+	  for (decls = OMP_CLAUSE_DECL (c), i = 0;
+	       decls && TREE_CODE (decls) == TREE_LIST;
+	       decls = TREE_CHAIN (decls), ++i)
+	    if (i < gimplify_omp_ctxp->loop_iter_var.length ()
+		&& TREE_VALUE (decls) != gimplify_omp_ctxp->loop_iter_var[i])
+	      {
+		error_at (OMP_CLAUSE_LOCATION (c),
+			  "variable %qE is not an iteration "
+			  "of outermost loop %d, expected %qE",
+			  TREE_VALUE (decls), i + 1,
+			  gimplify_omp_ctxp->loop_iter_var[i]);
+		fail = true;
+		failures++;
+	      }
+	  /* Avoid being too redundant.  */
+	  if (!fail && i != gimplify_omp_ctxp->loop_iter_var.length ())
+	    {
+	      error_at (OMP_CLAUSE_LOCATION (c),
+			"number of variables in depend(sink) "
+			"clause does not match number of "
+			"iteration variables");
+	      failures++;
+	    }
+	}
+
+  if (failures
+      || !gimplify_omp_ctxp
+      || gimplify_omp_ctxp->loop_iter_var.length () == 0)
+    return gimple_build_omp_ordered (body, OMP_ORDERED_CLAUSES (expr));
+
+  /* Canonicalize sink dependence clauses into one folded clause if
+     possible.
+
+     The basic algorithm is to create a sink vector whose first
+     element is the GCD of all the first elements, and whose remaining
+     elements are the minimum of the subsequent columns.
+
+     We ignore dependence vectors whose first element is zero because
+     such dependencies are known to be executed by the same thread.
+
+     ?? ^^^^ Is this only applicable for collapse(1) loops?  If so, how
+     ?? to handle collapse(N) loops where N > 1?
+
+     We take into account the direction of the loop, so a minimum
+     becomes a maximum if the loop is iterating backwards.  We also
+     ignore sink clauses where the loop direction is unknown, or where
+     the offsets are clearly invalid because they are not a multiple
+     of the loop increment.
+
+     For example:
+
+	     for (i=0; i < N; ++i)
+		depend(sink:i-8,j-1)
+		depend(sink:i,j-2)	// Completely ignored because i+0.
+		depend(sink:i-4,j+3)
+		depend(sink:i-6,j+2)
+
+     Folded clause is:
+
+        depend(sink:-gcd(8,4,6),min(-1,3,2))
+	  -or-
+	depend(sink:-2,-1)
+  */
+
+  unsigned int len = gimplify_omp_ctxp->loop_iter_var.length ();
+  vec<wide_int> folded_deps;
+  folded_deps.create (len);
+  folded_deps.quick_grow_cleared (len);
+  /* TRUE if the first dimension's offset is negative.  */
+  bool neg_offset_p = false;
+
+  /* Bitmap representing dimensions in the final dependency vector that
+     have been set.  */
+  sbitmap folded_deps_used = sbitmap_alloc (len);
+  bitmap_clear (folded_deps_used);
+
+  tree *list_p = &OMP_ORDERED_CLAUSES (expr);
+  while ((c = *list_p) != NULL)
+    {
+      bool remove = false;
+
+      if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_DEPEND
+	  || OMP_CLAUSE_DEPEND_KIND (c) != OMP_CLAUSE_DEPEND_SINK)
+	goto next_ordered_clause;
+
+      for (decls = OMP_CLAUSE_DECL (c), i = 0;
+	   decls && TREE_CODE (decls) == TREE_LIST;
+	   decls = TREE_CHAIN (decls), ++i)
+	{
+	  gcc_assert (i < len);
+
+	  /* While the committee makes up its mind, bail if we have any
+	     non-constant steps.  Also, bail if any loop has an unknown
+	     direction.  */
+	  if (!gimplify_omp_ctxp->loop_const_step[i]
+	      || (gimplify_omp_ctxp->loop_dir[i]
+		  == gimplify_omp_ctx::DIR_UNKNOWN))
+	    goto gimplify_omp_ordered_ret;
+
+	  wide_int offset = TREE_PURPOSE (decls);
+
+	  /* Ignore invalid offsets that are not multiples of the step.  */
+	  if (!wi::multiple_of_p
+	      (wi::abs (offset),
+	       wi::abs ((wide_int)
+			gimplify_omp_ctxp->loop_const_step[i]),
+	       UNSIGNED))
+	    {
+	      warning_at (OMP_CLAUSE_LOCATION (c), 0,
+			  "ignoring sink clause with offset that is not "
+			  "a multiple of the loop step");
+	      remove = true;
+	      goto next_ordered_clause;
+	    }
+
+	  /* Calculate the first dimension.  The first dimension of
+	     the folded dependency vector is the GCD of the first
+	     elements, while ignoring any first elements whose offset
+	     is 0.  */
+	  if (i == 0)
+	    {
+	      /* Ignore dependence vectors whose first dimension is 0.  */
+	      if (offset == 0)
+		goto next_ordered_clause;
+	      else
+		{
+		  neg_offset_p =
+		    wi::neg_p (offset,
+			       TYPE_SIGN (TREE_TYPE (TREE_PURPOSE (decls))));
+		  if ((gimplify_omp_ctxp->loop_dir[0]
+		       == gimplify_omp_ctx::DIR_FORWARD && !neg_offset_p)
+		      || (gimplify_omp_ctxp->loop_dir[0]
+			  == gimplify_omp_ctx::DIR_BACKWARD && neg_offset_p))
+		    {
+		      error_at (OMP_CLAUSE_LOCATION (c),
+				"first offset must be in opposite direction "
+				"of loop iterations");
+		      goto gimplify_omp_ordered_ret;
+		    }
+		  /* Initialize the first time around.  */
+		  if (!bitmap_bit_p (folded_deps_used, 0))
+		    {
+		      bitmap_set_bit (folded_deps_used, 0);
+		      folded_deps[0] = wi::abs (offset);
+		    }
+		  else
+		    folded_deps[i] = wi::gcd (folded_deps[0], offset, UNSIGNED);
+		}
+	    }
+	  /* Calculate minimum for the remaining dimensions.  */
+	  else
+	    {
+	      if (!bitmap_bit_p (folded_deps_used, i))
+		{
+		  bitmap_set_bit (folded_deps_used, i);
+		  folded_deps[i] = offset;
+		}
+	      else if ((gimplify_omp_ctxp->loop_dir[i]
+			== gimplify_omp_ctx::DIR_FORWARD
+			&& wi::lts_p (offset, folded_deps[i]))
+		       || (gimplify_omp_ctxp->loop_dir[i]
+			   == gimplify_omp_ctx::DIR_BACKWARD
+			   && wi::gts_p (offset, folded_deps[i])))
+		folded_deps[i] = offset;
+	    }
+	}
+
+      remove = true;
+
+    next_ordered_clause:
+      if (remove)
+	*list_p = OMP_CLAUSE_CHAIN (c);
+      else
+	list_p = &OMP_CLAUSE_CHAIN (c);
+    }
+
+  for (i = 0; i < len; ++i)
+    if (!bitmap_bit_p (folded_deps_used, i))
+      break;
+  if (i == len)
+    {
+      if (neg_offset_p)
+	folded_deps[0] = -folded_deps[0];
+
+      tree vec = NULL;
+      i = len;
+      do
+	{
+	  i--;
+	  tree var = gimplify_omp_ctxp->loop_iter_var[i];
+	  vec = tree_cons (wide_int_to_tree (TREE_TYPE (var), folded_deps[i]),
+			   var, vec);
+	}
+      while (i);
+
+      c = build_omp_clause (UNKNOWN_LOCATION, OMP_CLAUSE_DEPEND);
+      OMP_CLAUSE_DEPEND_KIND (c) = OMP_CLAUSE_DEPEND_SINK;
+      OMP_CLAUSE_DECL (c) = vec;
+      OMP_CLAUSE_CHAIN (c) = OMP_ORDERED_CLAUSES (expr);
+      OMP_ORDERED_CLAUSES (expr) = c;
+    }
+
+ gimplify_omp_ordered_ret:
+  sbitmap_free (folded_deps_used);
+  folded_deps.release ();
+
+  return gimple_build_omp_ordered (body, OMP_ORDERED_CLAUSES (expr));
+}
+
 /* Convert the GENERIC expression tree *EXPR_P to GIMPLE.  If the
    expression produces a value to be used as an operand inside a GIMPLE
    statement, the value will be stored back in *EXPR_P.  This value will
@@ -9200,38 +9470,7 @@  gimplify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
 		}
 		break;
 	      case OMP_ORDERED:
-		if (gimplify_omp_ctxp)
-		  for (tree c = OMP_ORDERED_CLAUSES (*expr_p);
-		       c; c = OMP_CLAUSE_CHAIN (c))
-		    if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_DEPEND
-			&& OMP_CLAUSE_DEPEND_KIND (c) == OMP_CLAUSE_DEPEND_SINK)
-		      {
-			unsigned int n = 0;
-			bool fail = false;
-			for (tree decls = OMP_CLAUSE_DECL (c);
-			     decls && TREE_CODE (decls) == TREE_LIST;
-			     decls = TREE_CHAIN (decls), ++n)
-			  if (n < gimplify_omp_ctxp->iter_vars.length ()
-			      && TREE_VALUE (decls)
-			      != gimplify_omp_ctxp->iter_vars[n])
-			    {
-			      error_at (OMP_CLAUSE_LOCATION (c),
-					"variable %qE is not an iteration "
-					"of outermost loop %d, expected %qE",
-					TREE_VALUE (decls), n + 1,
-					gimplify_omp_ctxp->iter_vars[n]);
-			      fail = true;
-			    }
-			/* Avoid being too redundant.  */
-			if (!fail
-			    && n != gimplify_omp_ctxp->iter_vars.length ())
-			  error_at (OMP_CLAUSE_LOCATION (c),
-			     "number of variables in depend(sink) clause "
-			     "does not match number of iteration variables");
-		      }
-
-		g = gimple_build_omp_ordered (body,
-					      OMP_ORDERED_CLAUSES (*expr_p));
+		g = gimplify_omp_ordered (*expr_p, body);
 		break;
 	      case OMP_CRITICAL:
 		gimplify_scan_omp_clauses (&OMP_CRITICAL_CLAUSES (*expr_p),
diff --git a/gcc/testsuite/c-c++-common/gomp/sink-4.c b/gcc/testsuite/c-c++-common/gomp/sink-4.c
index 7934de2..a34f081 100644
--- a/gcc/testsuite/c-c++-common/gomp/sink-4.c
+++ b/gcc/testsuite/c-c++-common/gomp/sink-4.c
@@ -22,4 +22,4 @@  funk (foo *begin, foo *end)
     }
 }
 
-/* { dg-final { scan-tree-dump-times "depend\\(sink:p\\+400\\)" 1 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "depend\\(sink:p\\+400.\\)" 1 "gimple" } } */
diff --git a/gcc/testsuite/gcc.dg/gomp/sink-fold-1.c b/gcc/testsuite/gcc.dg/gomp/sink-fold-1.c
new file mode 100644
index 0000000..64ddc92
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gomp/sink-fold-1.c
@@ -0,0 +1,30 @@ 
+/* { dg-do compile } */
+/* { dg-options "-fopenmp -fdump-tree-gimple" } */
+
+/* Test depend(sink) clause folding.  */
+
+int i,j, N;
+
+extern void bar();
+
+void
+funk ()
+{
+#pragma omp parallel for ordered(2)
+  for (i=0; i < N; i++)
+    for (j=0; j < N; ++j)
+    {
+/* We should keep the (sink:i,j-2) by virtue of it the i+0.  The
+   remaining clauses get folded with a GCD of -2 for `i' and a minimum
+   of -1 for 'j'.  */
+#pragma omp ordered \
+  depend(sink:i-8,j-1) \
+  depend(sink:i, j-2) \
+  depend(sink:i-4,j+3) \
+  depend(sink:i-6,j+2)
+        bar();
+#pragma omp ordered depend(source)
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "omp ordered depend\\(sink:i-2,j-1\\) depend\\(sink:i,j-2\\)" 1 "gimple" } } */
diff --git a/gcc/testsuite/gcc.dg/gomp/sink-fold-2.c b/gcc/testsuite/gcc.dg/gomp/sink-fold-2.c
new file mode 100644
index 0000000..9b3c442
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gomp/sink-fold-2.c
@@ -0,0 +1,28 @@ 
+/* { dg-do compile } */
+
+int i,j, N;
+
+extern void bar();
+
+void
+funk ()
+{
+#pragma omp parallel for ordered(2)
+  for (i=0; i < N; i += 3)
+    for (j=0; j < N; ++j)
+    {
+#pragma omp ordered depend(sink:i-8,j-1) /* { dg-warning "ignoring sink clause with offset that is not a multiple" } */
+#pragma omp ordered depend(sink:i+3,j-1) /* { dg-error "first offset must be in opposite direction" } */
+        bar();
+#pragma omp ordered depend(source)
+    }
+
+#pragma omp parallel for ordered(2)
+  for (i=N; i > 0; --i)
+    for (j=0; j < N; ++j)
+      {
+#pragma omp ordered depend(sink:i-1,j-1) /* { dg-error "first offset must be in opposite direction" } */
+	bar();
+#pragma omp ordered depend(source)
+      }
+}
diff --git a/gcc/testsuite/gcc.dg/gomp/sink-fold-3.c b/gcc/testsuite/gcc.dg/gomp/sink-fold-3.c
new file mode 100644
index 0000000..79cb5f3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gomp/sink-fold-3.c
@@ -0,0 +1,25 @@ 
+/* { dg-do compile } */
+/* { dg-options "-fopenmp -fdump-tree-gimple" } */
+
+/* Test that we fold sink offsets correctly while taking into account
+   pointer sizes.  */
+
+typedef struct {
+    char stuff[400];
+} foo;
+
+void
+funk (foo *begin, foo *end)
+{
+  foo *p;
+#pragma omp parallel for ordered(1)
+  for (p=end; p > begin; p--)
+    {
+#pragma omp ordered depend(sink:p+2) depend(sink:p+4)
+      void bar ();
+        bar();
+#pragma omp ordered depend(source)
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "depend\\(sink:p\\+800B\\)" 1 "gimple" } } */
diff --git a/gcc/wide-int.h b/gcc/wide-int.h
index d8f7b46..c20db61 100644
--- a/gcc/wide-int.h
+++ b/gcc/wide-int.h
@@ -514,6 +514,7 @@  namespace wi
   BINARY_FUNCTION div_round (const T1 &, const T2 &, signop, bool * = 0);
   BINARY_FUNCTION divmod_trunc (const T1 &, const T2 &, signop,
 				WI_BINARY_RESULT (T1, T2) *);
+  BINARY_FUNCTION gcd (const T1 &, const T2 &, signop = UNSIGNED);
   BINARY_FUNCTION mod_trunc (const T1 &, const T2 &, signop, bool * = 0);
   BINARY_FUNCTION smod_trunc (const T1 &, const T2 &);
   BINARY_FUNCTION umod_trunc (const T1 &, const T2 &);
@@ -2653,6 +2654,27 @@  wi::divmod_trunc (const T1 &x, const T2 &y, signop sgn,
   return quotient;
 }
 
+/* Compute the greatest common divisor of two numbers A and B using
+   Euclid's algorithm.  */
+template <typename T1, typename T2>
+inline WI_BINARY_RESULT (T1, T2)
+wi::gcd (const T1 &a, const T2 &b, signop sgn)
+{
+  T1 x, y, z;
+
+  x = wi::abs (a);
+  y = wi::abs (b);
+
+  while (gt_p (x, 0, sgn))
+    {
+      z = mod_trunc (y, x, sgn);
+      y = x;
+      x = z;
+    }
+
+  return y;
+}
+
 /* Compute X / Y, rouding towards 0, and return the remainder.
    Treat X and Y as having the signedness given by SGN.  Indicate
    in *OVERFLOW if the division overflows.  */