diff mbox series

[OG10] Fortran: delinearize multi-dimensional array accesses

Message ID a3959346-3e90-b30f-6ebe-bd89ebdf43e2@codesourcery.com
State New
Headers show
Series [OG10] Fortran: delinearize multi-dimensional array accesses | expand

Commit Message

Sandra Loosemore Dec. 17, 2020, 4:49 p.m. UTC
The attached patch implements delinearization of array accesses in the 
Fortran front end, something that has been discussed for a long time.

I've been asked to try to get this patch committed on the OG10 branch 
since it is blocking some further optimization work with Graphite for 
OpenACC kernels regions.  I have a mainline version of this patch as 
well that I can send to anyone interested in trying it out, but TBH, I 
don't think this is ready for mainline yet.  The current status is that 
there are still two gfortran tests that are regressing 
(gfortran.dg/graphite/id-9.f and 
gfortran.dg/vect/fast-math-mgrid-resid.f), and while it's been confirmed 
that this helps with Graphite optimizations as intended, we haven't yet 
run any benchmarks to confirm that it doesn't make other things slower. 
(It might, for instance, be appropriate to only delinearize when 
Graphite optimizations are also enabled.)

Anyway, I thought that for the OG10 branch I ought to also put this out 
for review/objection instead of just pushing it immediately.  Unless 
somebody explicitly approves the patch meanwhile, I will probably hold 
off until after the first of the year before pushing it, since many of 
us are going to be away until then anyway.

-Sandra

Comments

Thomas Koenig Dec. 26, 2020, 10:41 a.m. UTC | #1
Hi Sandra,

> The attached patch implements delinearization of array accesses in the 
> Fortran front end, something that has been discussed for a long time.

Definitely - among others, this is the subject of PR 14741, which is by
now quite historic.

> I've been asked to try to get this patch committed on the OG10 branch 
> since it is blocking some further optimization work with Graphite for 
> OpenACC kernels regions.  I have a mainline version of this patch as 
> well that I can send to anyone interested in trying it out, but TBH, I 
> don't think this is ready for mainline yet.

That would be indeed interesting.  Could you post that to the list as
well?

The current status is that
> there are still two gfortran tests that are regressing 
> (gfortran.dg/graphite/id-9.f and 
> gfortran.dg/vect/fast-math-mgrid-resid.f), and while it's been confirmed 
> that this helps with Graphite optimizations as intended, we haven't yet 
> run any benchmarks to confirm that it doesn't make other things slower. 

It is probably too late; this could go in for the next stage 1.

> (It might, for instance, be appropriate to only delinearize when 
> Graphite optimizations are also enabled.)

Makes sense.

Does the patch actually make loop interchange for matrix multiplication
work (done with C for loops or Fortran DO loops), or is there additional
work required?

Regarding scalarized loops: We still to not collapse loops for

   subroutine foo(a)
     real, dimension(:,:), contiguous :: a
     a = 5.
   end subroutine foo

so an extension to scalarized loops would be quite valuable.

It would be interesting to see if

   subroutine foo(a,n,m)
     real, dimension(n,m) :: a
     do j=1,m
       do i=1,n
         a(i,j) = 5.
       end do
     end do
   end subroutine foo

is collapsed to a single loop with the patch and Graphite.

Best regards

	Thomas
Sandra Loosemore Jan. 7, 2021, 7:35 p.m. UTC | #2
On 12/26/20 3:41 AM, Thomas Koenig wrote:
> 
> Hi Sandra,
> 
>> The attached patch implements delinearization of array accesses in the 
>> Fortran front end, something that has been discussed for a long time.
> 
> Definitely - among others, this is the subject of PR 14741, which is by
> now quite historic.
> 
>> I've been asked to try to get this patch committed on the OG10 branch 
>> since it is blocking some further optimization work with Graphite for 
>> OpenACC kernels regions.  I have a mainline version of this patch as 
>> well that I can send to anyone interested in trying it out, but TBH, I 
>> don't think this is ready for mainline yet.
> 
> That would be indeed interesting.  Could you post that to the list as
> well?

Attached to this mail.

> The current status is that
>> there are still two gfortran tests that are regressing 
>> (gfortran.dg/graphite/id-9.f and 
>> gfortran.dg/vect/fast-math-mgrid-resid.f), and while it's been 
>> confirmed that this helps with Graphite optimizations as intended, we 
>> haven't yet run any benchmarks to confirm that it doesn't make other 
>> things slower. 
> 
> It is probably too late; this could go in for the next stage 1.

Yes, I hope we can get this more polished by that time.

> Does the patch actually make loop interchange for matrix multiplication
> work (done with C for loops or Fortran DO loops), or is there additional
> work required?

There's some additional work required, and probably not by me since I 
know next to nothing about graphite or GCC's loop optimization framework 
generally.  :-(  (The motivation for this patch at this time is that we 
need the delinearization for some ongoing OpenACC parallelization work.)

> Regarding scalarized loops: We still to not collapse loops for
> 
>    subroutine foo(a)
>      real, dimension(:,:), contiguous :: a
>      a = 5.
>    end subroutine foo
> 
> so an extension to scalarized loops would be quite valuable.

Yes.  I guesstimated that would be a medium-sized project since the 
scalarized references are generated by a completely different code path.

> It would be interesting to see if
> 
>    subroutine foo(a,n,m)
>      real, dimension(n,m) :: a
>      do j=1,m
>        do i=1,n
>          a(i,j) = 5.
>        end do
>      end do
>    end subroutine foo
> 
> is collapsed to a single loop with the patch and Graphite.

This one also requires some additional work.  At least Graphite 
recognizes the SCoP around the loop nest with this patch which it did 
not do without delinearization.

-Sandra
diff mbox series

Patch

commit bde348ad056b57f97d581205e75d083b17a8d69c
Author: Sandra Loosemore <sandra@codesourcery.com>
Date:   Thu Dec 17 08:13:00 2020 -0800

    Fortran: delinearize multi-dimensional array accesses
    
    The Fortran front end presently linearizes accesses to
    multi-dimensional arrays by combining the indices for the various
    dimensions into a series of explicit multiplies and adds with
    refactoring to allow CSE of invariant parts of the computation.
    Unfortunately this representation interferes with Graphite-based loop
    optimizations.  It is difficult to recover the original
    multi-dimensional form of the access by the time loop optimizations
    run because parts of it have already been optimized away or into a
    form that is not easily recognizable, so it seems better to have the
    Fortran front end produce delinearized accesses to begin with, a set
    of nested ARRAY_REFs similar to the existing behavior of the C and C++
    front ends.  This is a long-standing problem that has previously been
    discussed e.g. in PR 14741 and PR61000.
    
    This patch is an initial implementation for explicit array accesses
    only; it doesn't handle the accesses generated during scalarization of
    whole-array or array-section operations, which follow a different code
    path.
    
    2020-12-17  Sandra Loosemore  <sandra@codesourcery.com>
    	    Tobias Burnus  <tobias@codesourcery.com>
    
    	gcc/
    	* expr.c (get_inner_reference): Handle NOP_EXPR like
    	VIEW_CONVERT_EXPR.
    
    	gcc/fortran/
    	* lang.opt (-param=delinearize=): New.
    	* trans-array.c (get_class_array_vptr): New, split from...
    	(build_array_ref): ...here.
    	(get_array_lbound, get_array_ubound): New, split from...
    	(gfc_conv_array_ref): ...here.  Additional code refactoring
    	plus support for delinearization of the array access.
    
    	gcc/testsuite/
    	* gfortran.dg/assumed_type_2.f90: Adjust patterns.
    	* gfortran.dg/graphite/block-3.f90: Remove xfails.
    	* gfortran.dg/graphite/block-4.f90: Likewise.
    	* gfortran.dg/inline_matmul_24.f90: Adjust patterns.
    	* gfortran.dg/no_arg_check_2.f90: Likewise.
    	* gfortran.dg/pr32921.f: Likewise.
    	* gfortran.dg/reassoc_4.f: Disable delinearization for this test.

diff --git a/gcc/expr.c b/gcc/expr.c
index b2aa23e..c450ce7 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -7383,6 +7383,7 @@  get_inner_reference (tree exp, poly_int64_pod *pbitsize,
 	  break;
 
 	case VIEW_CONVERT_EXPR:
+	case NOP_EXPR:
 	  break;
 
 	case MEM_REF:
diff --git a/gcc/fortran/lang.opt b/gcc/fortran/lang.opt
index f37242c..fb0e72d 100644
--- a/gcc/fortran/lang.opt
+++ b/gcc/fortran/lang.opt
@@ -513,6 +513,10 @@  fdefault-real-16
 Fortran Var(flag_default_real_16)
 Set the default real kind to an 16 byte wide type.
 
+-param=delinearize=
+Common Joined UInteger Var(flag_delinearize_aref) Init(1) IntegerRange(0,1) Param Optimization
+Delinearize array references.
+
 fdollar-ok
 Fortran Var(flag_dollar_ok)
 Allow dollar signs in entity names.
diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index a3b3ac9..54773c0 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -3557,11 +3557,9 @@  add_to_offset (tree *cst_offset, tree *offset, tree t)
     }
 }
 
-
 static tree
-build_array_ref (tree desc, tree offset, tree decl, tree vptr)
+get_class_array_vptr (tree desc, tree vptr)
 {
-  tree tmp;
   tree type;
   tree cdesc;
 
@@ -3585,19 +3583,74 @@  build_array_ref (tree desc, tree offset, tree decl, tree vptr)
 	  && GFC_CLASS_TYPE_P (TYPE_CANONICAL (type)))
 	vptr = gfc_class_vptr_get (TREE_OPERAND (cdesc, 0));
     }
+  return vptr;
+}
 
+static tree
+build_array_ref (tree desc, tree offset, tree decl, tree vptr)
+{
+  tree tmp;
+  vptr = get_class_array_vptr (desc, vptr);
   tmp = gfc_conv_array_data (desc);
   tmp = build_fold_indirect_ref_loc (input_location, tmp);
   tmp = gfc_build_array_ref (tmp, offset, decl, vptr);
   return tmp;
 }
 
+/* Get the declared lower bound for rank N of array DECL which might
+   be either a bare array or a descriptor.  This differs from
+   gfc_conv_array_lbound because it gets information for temporary array
+   objects from AR instead of the descriptor (they can differ).  */
+
+static tree
+get_array_lbound (tree decl, int n, gfc_symbol *sym,
+		  gfc_array_ref *ar, gfc_se *se)
+{
+  if (sym->attr.temporary)
+    {
+      gfc_se tmpse;
+      gfc_init_se (&tmpse, se);
+      gfc_conv_expr_type (&tmpse, ar->as->lower[n], gfc_array_index_type);
+      gfc_add_block_to_block (&se->pre, &tmpse.pre);
+      return tmpse.expr;
+    }
+  else
+    return gfc_conv_array_lbound (decl, n);
+}
+
+/* Similarly for the upper bound.  */
+static tree
+get_array_ubound (tree decl, int n, gfc_symbol *sym,
+		  gfc_array_ref *ar, gfc_se *se)
+{
+  if (sym->attr.temporary)
+    {
+      gfc_se tmpse;
+      gfc_init_se (&tmpse, se);
+      gfc_conv_expr_type (&tmpse, ar->as->upper[n], gfc_array_index_type);
+      gfc_add_block_to_block (&se->pre, &tmpse.pre);
+      return tmpse.expr;
+    }
+  else
+    return gfc_conv_array_ubound (decl, n);
+}
+
 
 /* Build an array reference.  se->expr already holds the array descriptor.
    This should be either a variable, indirect variable reference or component
    reference.  For arrays which do not have a descriptor, se->expr will be
    the data pointer.
-   a(i, j, k) = base[offset + i * stride[0] + j * stride[1] + k * stride[2]]*/
+
+   There are two strategies here.  In the traditional case, multidimensional
+   arrays are explicitly linearized into a one-dimensional array, with the
+   index computed as if by
+   a(i, j, k) = base[offset + i * stride[0] + j * stride[1] + k * stride[2]]
+
+   However, we can often get better code using the Graphite framework
+   and scalar evolutions in the middle end, which expects to see
+   multidimensional array accesses represented as nested ARRAY_REFs, similar
+   to what the C/C++ front ends produce.  Delinearization is controlled
+   by flag_delinearize_aref.  */
 
 void
 gfc_conv_array_ref (gfc_se * se, gfc_array_ref * ar, gfc_expr *expr,
@@ -3608,11 +3661,15 @@  gfc_conv_array_ref (gfc_se * se, gfc_array_ref * ar, gfc_expr *expr,
   tree tmp;
   tree stride;
   tree decl = NULL_TREE;
+  tree cooked_decl = NULL_TREE;
+  tree vptr = se->class_vptr;
   gfc_se indexse;
-  gfc_se tmpse;
   gfc_symbol * sym = expr->symtree->n.sym;
   char *var_name = NULL;
+  tree aref = NULL_TREE;
+  tree atype = NULL_TREE;
 
+  /* Handle coarrays.  */
   if (ar->dimen == 0)
     {
       gcc_assert (ar->codimen || sym->attr.select_rank_temporary
@@ -3672,11 +3729,147 @@  gfc_conv_array_ref (gfc_se * se, gfc_array_ref * ar, gfc_expr *expr,
 	}
     }
 
-  cst_offset = offset = gfc_index_zero_node;
-  add_to_offset (&cst_offset, &offset, gfc_conv_array_offset (se->expr));
+  /* Per comments above, DECL is not always a declaration.  It may be
+     either a variable, indirect variable reference, or component
+     reference.  It may have array or pointer type, or it may be a
+     descriptor with RECORD_TYPE.  */
+  decl = se->expr;
+
+  /* A pointer array component can be detected from its field decl. Fix
+     the descriptor, mark the resulting variable decl and store it in
+     COOKED_DECL to pass to gfc_build_array_ref.  */
+  if (get_CFI_desc (sym, expr, &cooked_decl, ar))
+    cooked_decl = build_fold_indirect_ref_loc (input_location, cooked_decl);
+  if (!expr->ts.deferred && !sym->attr.codimension
+      && is_pointer_array (se->expr))
+    {
+      if (TREE_CODE (se->expr) == COMPONENT_REF)
+	cooked_decl = se->expr;
+      else if (TREE_CODE (se->expr) == INDIRECT_REF)
+	cooked_decl = TREE_OPERAND (se->expr, 0);
+      else
+	cooked_decl = se->expr;
+    }
+  else if (expr->ts.deferred
+	   || (sym->ts.type == BT_CHARACTER
+	       && sym->attr.select_type_temporary))
+    {
+      if (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (se->expr)))
+	{
+	  cooked_decl = se->expr;
+	  if (TREE_CODE (cooked_decl) == INDIRECT_REF)
+	    cooked_decl = TREE_OPERAND (cooked_decl, 0);
+	}
+      else
+	cooked_decl = sym->backend_decl;
+    }
+  else if (sym->ts.type == BT_CLASS)
+    {
+      cooked_decl = NULL_TREE;
+    }
+
+  /* Find the base of the array; this normally has ARRAY_TYPE.  */
+  tree base = build_fold_indirect_ref_loc (input_location,
+					   gfc_conv_array_data (se->expr));
+  tree type = TREE_TYPE (base);
+
+  /* Handle special cases, copied from gfc_build_array_ref.  After we get
+     through this, we know TYPE definitely is an ARRAY_TYPE.  */
+  if (GFC_ARRAY_TYPE_P (type) && GFC_TYPE_ARRAY_RANK (type) == 0)
+    {
+      gcc_assert (GFC_TYPE_ARRAY_CORANK (type) > 0);
+      se->expr = fold_convert (TYPE_MAIN_VARIANT (type), base);
+      return;
+    }
+  if (TREE_CODE (type) != ARRAY_TYPE)
+    {
+      gcc_assert (cooked_decl == NULL_TREE);
+      se->expr = base;
+      return;
+    }
 
-  /* Calculate the offsets from all the dimensions.  Make sure to associate
-     the final offset so that we form a chain of loop invariant summands.  */
+  /* Check for cases where we cannot delinearize.  */
+
+  bool delinearize = flag_delinearize_aref;
+
+  /* There is no point in trying to delinearize 1-dimensional arrays.  */
+  if (ar->dimen == 1)
+    delinearize = false;
+
+  if (delinearize
+      && (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (se->expr))
+	  || (DECL_P (se->expr)
+	      && DECL_LANG_SPECIFIC (se->expr)
+	      && GFC_DECL_SAVED_DESCRIPTOR (se->expr))))
+    {
+      /* Descriptor arrays that may not be contiguous cannot
+	 be delinearized without using the stride in the descriptor,
+	 which generally involves introducing a division operation.
+	 That's unlikely to produce optimal code, so avoid doing it.  */
+      tree desc = se->expr;
+      if (!GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (se->expr)))
+	desc = GFC_DECL_SAVED_DESCRIPTOR (se->expr);
+      tree tmptype = TREE_TYPE (desc);
+      if (POINTER_TYPE_P (tmptype))
+	tmptype = TREE_TYPE (tmptype);
+      enum gfc_array_kind akind = GFC_TYPE_ARRAY_AKIND (tmptype);
+      if (akind != GFC_ARRAY_ASSUMED_SHAPE_CONT
+	  && akind != GFC_ARRAY_ASSUMED_RANK_CONT
+	  && akind != GFC_ARRAY_ALLOCATABLE
+	  && akind != GFC_ARRAY_POINTER_CONT)
+	delinearize = false;
+    }
+
+  /* See gfc_build_array_ref in trans.c.  If we have a cooked_decl or
+     vptr, then we most likely have to do pointer arithmetic using a
+     linearized array offset.  */
+  if (delinearize && cooked_decl)
+    delinearize = false;
+  else if (delinearize && get_class_array_vptr (se->expr, vptr))
+    delinearize = false;
+
+  if (!delinearize)
+    {
+      /* Initialize the offset from the array descriptor.  This accounts
+	 for the array base being something other than zero.  */
+      cst_offset = offset = gfc_index_zero_node;
+      add_to_offset (&cst_offset, &offset, gfc_conv_array_offset (decl));
+    }
+  else
+    {
+      /* If we are delinearizing, build up the nested array type using the
+	 dimension information we have for each rank.  */
+      atype = TREE_TYPE (type);
+      for (n = 0; n < ar->dimen; n++)
+	{
+	  /* We're working from the outermost nested array reference inward
+	     in this step.  ATYPE is the element type for the access in
+	     this rank; build the new array type based on the bounds
+	     information and store it back into ATYPE for the next rank's
+	     processing.  */
+	  tree lbound = get_array_lbound (decl, n, sym, ar, se);
+	  tree ubound = get_array_ubound (decl, n, sym, ar, se);
+	  tree dimen = build_range_type (TREE_TYPE (lbound),
+					 lbound, ubound);
+	  atype = build_array_type (atype, dimen);
+
+	  /* Emit a DECL_EXPR for the array type so the gimplification of
+	     its type sizes works correctly.  */
+	  if (! TYPE_NAME (atype))
+	    TYPE_NAME (atype) = build_decl (UNKNOWN_LOCATION, TYPE_DECL,
+					    NULL_TREE, atype);
+	  gfc_add_expr_to_block (&se->pre,
+				 build1 (DECL_EXPR, atype,
+					 TYPE_NAME (atype)));
+	}
+
+      /* Cast base to the innermost array type.  */
+      if (DECL_P (base))
+	TREE_ADDRESSABLE (base) = 1;
+      aref = build1 (NOP_EXPR, atype, base);
+    }
+
+  /* Process indices in reverse order.  */
   for (n = ar->dimen - 1; n >= 0; n--)
     {
       /* Calculate the index for this dimension.  */
@@ -3694,16 +3887,7 @@  gfc_conv_array_ref (gfc_se * se, gfc_array_ref * ar, gfc_expr *expr,
 	  indexse.expr = save_expr (indexse.expr);
 
 	  /* Lower bound.  */
-	  tmp = gfc_conv_array_lbound (se->expr, n);
-	  if (sym->attr.temporary)
-	    {
-	      gfc_init_se (&tmpse, se);
-	      gfc_conv_expr_type (&tmpse, ar->as->lower[n],
-				  gfc_array_index_type);
-	      gfc_add_block_to_block (&se->pre, &tmpse.pre);
-	      tmp = tmpse.expr;
-	    }
-
+	  tmp = get_array_lbound (decl, n, sym, ar, se);
 	  cond = fold_build2_loc (input_location, LT_EXPR, logical_type_node,
 				  indexse.expr, tmp);
 	  msg = xasprintf ("Index '%%ld' of dimension %d of array '%s' "
@@ -3718,16 +3902,7 @@  gfc_conv_array_ref (gfc_se * se, gfc_array_ref * ar, gfc_expr *expr,
 	     arrays.  */
 	  if (n < ar->dimen - 1 || ar->as->type != AS_ASSUMED_SIZE)
 	    {
-	      tmp = gfc_conv_array_ubound (se->expr, n);
-	      if (sym->attr.temporary)
-		{
-		  gfc_init_se (&tmpse, se);
-		  gfc_conv_expr_type (&tmpse, ar->as->upper[n],
-				      gfc_array_index_type);
-		  gfc_add_block_to_block (&se->pre, &tmpse.pre);
-		  tmp = tmpse.expr;
-		}
-
+	      tmp = get_array_ubound (decl, n, sym, ar, se);
 	      cond = fold_build2_loc (input_location, GT_EXPR,
 				      logical_type_node, indexse.expr, tmp);
 	      msg = xasprintf ("Index '%%ld' of dimension %d of array '%s' "
@@ -3740,51 +3915,41 @@  gfc_conv_array_ref (gfc_se * se, gfc_array_ref * ar, gfc_expr *expr,
 	    }
 	}
 
-      /* Multiply the index by the stride.  */
-      stride = gfc_conv_array_stride (se->expr, n);
-      tmp = fold_build2_loc (input_location, MULT_EXPR, gfc_array_index_type,
-			     indexse.expr, stride);
-
-      /* And add it to the total.  */
-      add_to_offset (&cst_offset, &offset, tmp);
-    }
-
-  if (!integer_zerop (cst_offset))
-    offset = fold_build2_loc (input_location, PLUS_EXPR,
-			      gfc_array_index_type, offset, cst_offset);
+      if (!delinearize)
+	{
+	  /* Multiply the index by the stride.  */
+	  stride = gfc_conv_array_stride (decl, n);
+	  tmp = fold_build2_loc (input_location, MULT_EXPR,
+				 gfc_array_index_type,
+				 indexse.expr, stride);
 
-  /* A pointer array component can be detected from its field decl. Fix
-     the descriptor, mark the resulting variable decl and pass it to
-     build_array_ref.  */
-  if (get_CFI_desc (sym, expr, &decl, ar))
-    decl = build_fold_indirect_ref_loc (input_location, decl);
-  if (!expr->ts.deferred && !sym->attr.codimension
-      && is_pointer_array (se->expr))
-    {
-      if (TREE_CODE (se->expr) == COMPONENT_REF)
-	decl = se->expr;
-      else if (TREE_CODE (se->expr) == INDIRECT_REF)
-	decl = TREE_OPERAND (se->expr, 0);
+	  /* And add it to the total.  */
+	  add_to_offset (&cst_offset, &offset, tmp);
+	}
       else
-	decl = se->expr;
-    }
-  else if (expr->ts.deferred
-	   || (sym->ts.type == BT_CHARACTER
-	       && sym->attr.select_type_temporary))
-    {
-      if (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (se->expr)))
 	{
-	  decl = se->expr;
-	  if (TREE_CODE (decl) == INDIRECT_REF)
-	    decl = TREE_OPERAND (decl, 0);
+	  /* Peel off a layer of array nesting from ATYPE to
+	     to get the result type of the new ARRAY_REF.  */
+	  atype = TREE_TYPE (atype);
+	  aref = build4 (ARRAY_REF, atype, aref, indexse.expr,
+			 NULL_TREE, NULL_TREE);
 	}
-      else
-	decl = sym->backend_decl;
     }
-  else if (sym->ts.type == BT_CLASS)
-    decl = NULL_TREE;
 
-  se->expr = build_array_ref (se->expr, offset, decl, se->class_vptr);
+  if (!delinearize)
+    {
+      /* Build a linearized array reference using the offset from all
+	 dimensions.  */
+      if (!integer_zerop (cst_offset))
+	offset = fold_build2_loc (input_location, PLUS_EXPR,
+				  gfc_array_index_type, offset, cst_offset);
+      se->class_vptr = vptr;
+      vptr = get_class_array_vptr (se->expr, vptr);
+      se->expr = gfc_build_array_ref (base, offset, cooked_decl, vptr);
+    }
+ else
+   /* Return the outermost ARRAY_REF we already built.  */
+   se->expr = aref;
 }
 
 
diff --git a/gcc/testsuite/gfortran.dg/assumed_type_2.f90 b/gcc/testsuite/gfortran.dg/assumed_type_2.f90
index 5d3cd7e..07be87e 100644
--- a/gcc/testsuite/gfortran.dg/assumed_type_2.f90
+++ b/gcc/testsuite/gfortran.dg/assumed_type_2.f90
@@ -147,12 +147,12 @@  end
 
 ! { dg-final { scan-tree-dump-times "sub_scalar .&scalar_int," 1 "original" } }
 ! { dg-final { scan-tree-dump-times "sub_scalar .&scalar_t1," 1 "original" } }
-! { dg-final { scan-tree-dump-times "sub_scalar .&array_int.1.," 1 "original" } }
+! { dg-final { scan-tree-dump-times "sub_scalar .&.*array_int" 1 "original" } }
 ! { dg-final { scan-tree-dump-times "sub_scalar .&scalar_t1," 1 "original" } }
 
-! { dg-final { scan-tree-dump-times "sub_scalar .&\\(.\\(real.kind=4..0:. . restrict\\) array_real_alloc.data" 1 "original" } }
+! { dg-final { scan-tree-dump-times "sub_scalar .&.*real.kind=4..0.*restrict.*array_real_alloc.data" 1 "original" } }
 ! { dg-final { scan-tree-dump-times "sub_scalar .\\(character.kind=1..1:1. .\\) .array_char_ptr.data" 1 "original" } }
-! { dg-final { scan-tree-dump-times "sub_scalar .&\\(.\\(struct t2.0:. . restrict\\) array_t2_alloc.data" 1 "original" } }
+! { dg-final { scan-tree-dump-times "sub_scalar .&.*struct t2.0:..*restrict.*array_t2_alloc.data" 1 "original" } }
 ! { dg-final { scan-tree-dump-times "sub_scalar .\\(struct t3 .\\) .array_t3_ptr.data" 1 "original" } }
 ! { dg-final { scan-tree-dump-times "sub_scalar .\\(struct t1 .\\) array_class_t1_alloc._data.data" 1 "original" } }
 ! { dg-final { scan-tree-dump-times "sub_scalar .\\(struct t1 .\\) \\(array_class_t1_ptr._data.dat" 1 "original" } }
diff --git a/gcc/testsuite/gfortran.dg/graphite/block-3.f90 b/gcc/testsuite/gfortran.dg/graphite/block-3.f90
index 452de73..60c7952 100644
--- a/gcc/testsuite/gfortran.dg/graphite/block-3.f90
+++ b/gcc/testsuite/gfortran.dg/graphite/block-3.f90
@@ -12,6 +12,5 @@  enddo
 
 end subroutine matrix_multiply
 
-! { dg-final { scan-tree-dump-times "number of SCoPs: 1" 1 "graphite" { xfail *-*-* } } }
 ! { dg-final { scan-tree-dump-times "will be loop blocked" 1 "graphite" { xfail *-*-* } } }
 
diff --git a/gcc/testsuite/gfortran.dg/graphite/block-4.f90 b/gcc/testsuite/gfortran.dg/graphite/block-4.f90
index 42af5b6..1bc7a1b 100644
--- a/gcc/testsuite/gfortran.dg/graphite/block-4.f90
+++ b/gcc/testsuite/gfortran.dg/graphite/block-4.f90
@@ -15,6 +15,5 @@  enddo
 
 end subroutine matrix_multiply
 
-! { dg-final { scan-tree-dump-times "number of SCoPs: 1" 1 "graphite" { xfail *-*-* } } }
 ! { dg-final { scan-tree-dump-times "will be loop blocked" 1 "graphite" { xfail *-*-* } } }
 
diff --git a/gcc/testsuite/gfortran.dg/inline_matmul_24.f90 b/gcc/testsuite/gfortran.dg/inline_matmul_24.f90
index 3168d5f..8d84f3c 100644
--- a/gcc/testsuite/gfortran.dg/inline_matmul_24.f90
+++ b/gcc/testsuite/gfortran.dg/inline_matmul_24.f90
@@ -39,4 +39,4 @@  program testMATMUL
       call abort()
     end if
 end program testMATMUL
-! { dg-final { scan-tree-dump-times "gamma5\\\[__var_1_do \\* 4 \\+ __var_2_do\\\]|gamma5\\\[NON_LVALUE_EXPR <__var_1_do> \\* 4 \\+ NON_LVALUE_EXPR <__var_2_do>\\\]" 1 "original" } }
+! { dg-final { scan-tree-dump-times "gamma5.*\\\[NON_LVALUE_EXPR <__var_1_do>\\\]\\\[NON_LVALUE_EXPR <__var_2_do>\\\]" 1 "original" } }
diff --git a/gcc/testsuite/gfortran.dg/no_arg_check_2.f90 b/gcc/testsuite/gfortran.dg/no_arg_check_2.f90
index 3570b97..0900dd8 100644
--- a/gcc/testsuite/gfortran.dg/no_arg_check_2.f90
+++ b/gcc/testsuite/gfortran.dg/no_arg_check_2.f90
@@ -129,12 +129,12 @@  end
 
 ! { dg-final { scan-tree-dump-times "sub_scalar .&scalar_int," 1 "original" } }
 ! { dg-final { scan-tree-dump-times "sub_scalar .&scalar_t1," 1 "original" } }
-! { dg-final { scan-tree-dump-times "sub_scalar .&array_int.1.," 1 "original" } }
+! { dg-final { scan-tree-dump-times "sub_scalar .&.*array_int" 1 "original" } }
 ! { dg-final { scan-tree-dump-times "sub_scalar .&scalar_t1," 1 "original" } }
 
-! { dg-final { scan-tree-dump-times "sub_scalar .&\\(.\\(real.kind=4..0:. . restrict\\) array_real_alloc.data" 1 "original" } }
+! { dg-final { scan-tree-dump-times "sub_scalar .&.*real.kind=4..0.*restrict.*array_real_alloc.data" 1 "original" } }
 ! { dg-final { scan-tree-dump-times "sub_scalar .\\(character.kind=1..1:1. .\\) .array_char_ptr.data" 1 "original" } }
-! { dg-final { scan-tree-dump-times "sub_scalar .&\\(.\\(struct t2.0:. . restrict\\) array_t2_alloc.data" 1 "original" } }
+! { dg-final { scan-tree-dump-times "sub_scalar .&.*struct t2.0:..*restrict.*array_t2_alloc.data" 1 "original" } }
 ! { dg-final { scan-tree-dump-times "sub_scalar .\\(struct t3 .\\) .array_t3_ptr.data" 1 "original" } }
 ! { dg-final { scan-tree-dump-times "sub_scalar .\\(struct t1 .\\) array_class_t1_alloc._data.data" 1 "original" } }
 ! { dg-final { scan-tree-dump-times "sub_scalar .\\(struct t1 .\\) \\(array_class_t1_ptr._data.dat" 1 "original" } }
diff --git a/gcc/testsuite/gfortran.dg/pr32921.f b/gcc/testsuite/gfortran.dg/pr32921.f
index 0661208..8534386 100644
--- a/gcc/testsuite/gfortran.dg/pr32921.f
+++ b/gcc/testsuite/gfortran.dg/pr32921.f
@@ -45,4 +45,4 @@ 
 
       RETURN
       END
-! { dg-final { scan-tree-dump-times "stride" 4 "lim2" } }
+! { dg-final { scan-tree-dump-times "ubound" 4 "lim2" } }
diff --git a/gcc/testsuite/gfortran.dg/reassoc_4.f b/gcc/testsuite/gfortran.dg/reassoc_4.f
index fdcb46e..2368b76 100644
--- a/gcc/testsuite/gfortran.dg/reassoc_4.f
+++ b/gcc/testsuite/gfortran.dg/reassoc_4.f
@@ -1,5 +1,5 @@ 
 ! { dg-do compile }
-! { dg-options "-O3 -ffast-math -fdump-tree-reassoc1 --param max-completely-peeled-insns=200" }
+! { dg-options "-O3 -ffast-math -fdump-tree-reassoc1 --param max-completely-peeled-insns=200 --param delinearize=0" }
       subroutine anisonl(w,vo,anisox,s,ii1,jj1,weight)
       integer ii1,jj1,i1,iii1,j1,jjj1,k1,l1,m1,n1
       real*8 w(3,3),vo(3,3),anisox(3,3,3,3),s(60,60),weight