diff mbox

[RFC] Fix PR49957 - build array index differently

Message ID alpine.LNX.2.00.1108031715330.810@zhemvz.fhfr.qr
State New
Headers show

Commit Message

Richard Biener Aug. 3, 2011, 3:16 p.m. UTC
On Wed, 3 Aug 2011, Richard Guenther wrote:

> 
> This fixes PR49957 by keeping the array index into a multi-dimensional
> array in optimal associated form which is ((off + outermost) + ...) + 
> innermost) + constant) so that dependence analysis can properly
> handle it.  It doesn't work right now because we build the
> expression in reverse order, fold thinks it can do some fancy and
> the expression is of signed type and thus we know it doesn't overflow
> but we also won't re-associate it to a more optimal form.
> 
> I tried reversing the loop in gfc_conv_array_ref but that doesn't
> work (for example aliasing_dummy_4.f90 ICEs), thus the funny way
> of chaining the pluses.
> 
> I also don't know if there is maybe another place we build similar
> expressions that should be adjusted, too - this one is where
> we build the expression for the testcase I looked at.
> 
> The patch doesn't make 410.bwaves measurably faster, but at least
> it also doesn't get slower.
> 
> Bootstrap and regtest is currently running on x86_64-unknown-linux-gnu,
> the reversed loop one was ok (well, apart from those 2-3 fails).
> 
> Comments?  Any idea why reversing the loop would break?
> The ICE I got is
> 
> /space/rguenther/src/svn/trunk/gcc/testsuite/gfortran.dg/aliasing_dummy_4.f90: 
> In function 'test_f90':
> /space/rguenther/src/svn/trunk/gcc/testsuite/gfortran.dg/aliasing_dummy_4.f90:21:0: 
> internal compiler error: in gfc_conv_constant, at 
> fortran/trans-const.c:387
> Please submit a full bug report,
> with preprocessed source if appropriate.
> See <http://gcc.gnu.org/bugs.html> for instructions.

Tobias noticed the testcase is optimized away, fixed.  Also fixed
is another oversight for another loop in 410.bwaves, with constant
first index.

Re-bootstrapping currently.  The previous patch tested ok.

Richard.

2011-08-03  Richard Guenther  <rguenther@suse.de>

	PR fortran/49957
	* trans-array.c (gfc_conv_array_ref): Build the array index
	expression in optimally associated order.

	* gfortran.dg/vect/O3-pr49957.f: New testcase.

Comments

Tobias Burnus Aug. 3, 2011, 9:03 p.m. UTC | #1
Richard Guenther wrote:
> On Wed, 3 Aug 2011, Richard Guenther wrote:
>> Bootstrap and regtest is currently running on x86_64-unknown-linux-gnu,
>> the reversed loop one was ok (well, apart from those 2-3 fails).
> Re-bootstrapping currently.  The previous patch tested ok.

The patch looks OK. I don't know why the your first attempt of reversing 
the loops failed - but admittedly, I also do not exactly know what you 
did. I failed to come up with some other place where the order could be 
noncanonical; for instance, I think the code generated for arrays of 
strings should be fine.

Thanks for the patch!

Tobias
diff mbox

Patch

Index: gcc/fortran/trans-array.c
===================================================================
--- gcc/fortran/trans-array.c	(revision 177094)
+++ gcc/fortran/trans-array.c	(working copy)
@@ -2634,7 +2634,7 @@  gfc_conv_array_ref (gfc_se * se, gfc_arr
 		    locus * where)
 {
   int n;
-  tree index;
+  tree index, offset, *indexp;
   tree tmp;
   tree stride;
   gfc_se indexse;
@@ -2669,9 +2669,16 @@  gfc_conv_array_ref (gfc_se * se, gfc_arr
       return;
     }
 
-  index = gfc_index_zero_node;
+  offset = gfc_conv_array_offset (se->expr);
+  if (TREE_CODE (offset) != INTEGER_CST)
+    index = offset;
+  else
+    index = gfc_index_zero_node;
 
-  /* Calculate the offsets from all the dimensions.  */
+  indexp = &index;
+
+  /* Calculate the offsets from all the dimensions.  Make sure to associate
+     the final offset so that we form a chain of loop invariant summands.  */
   for (n = 0; n < ar->dimen; n++)
     {
       /* Calculate the index for this dimension.  */
@@ -2740,15 +2747,43 @@  gfc_conv_array_ref (gfc_se * se, gfc_arr
       tmp = fold_build2_loc (input_location, MULT_EXPR, gfc_array_index_type,
 			     indexse.expr, stride);
 
-      /* And add it to the total.  */
-      index = fold_build2_loc (input_location, PLUS_EXPR,
-			       gfc_array_index_type, index, tmp);
+      /* And add it to the total.  Avoid folding as that re-associates
+         in a non-optimal way.  We want to have constant offsets as
+	 the outermost addition and the rest of the additions in order
+	 of the loop depth.  */
+      if (!integer_zerop (index))
+	{
+	  if (TREE_CODE (tmp) == INTEGER_CST)
+	    {
+	      bool reset = indexp == &index;
+	      index = fold_build2_loc (input_location, PLUS_EXPR,
+				       gfc_array_index_type, index, tmp);
+	      if (reset)
+		{
+		  if (TREE_CODE (index) == PLUS_EXPR)
+		    indexp = &TREE_OPERAND (index, 0);
+		  else
+		    indexp = &index;
+		}
+	    }
+	  else
+	    {
+	      *indexp
+		= build2_loc (input_location, PLUS_EXPR,
+			      gfc_array_index_type, *indexp, tmp);
+	      indexp = &TREE_OPERAND (*indexp, 0);
+	    }
+	}
+      else
+	{
+	  index = tmp;
+	  indexp = &index;
+	}
     }
 
-  tmp = gfc_conv_array_offset (se->expr);
-  if (!integer_zerop (tmp))
+  if (TREE_CODE (offset) == INTEGER_CST)
     index = fold_build2_loc (input_location, PLUS_EXPR,
-			     gfc_array_index_type, index, tmp);
+			     gfc_array_index_type, index, offset);
 
   /* Access the calculated element.  */
   tmp = gfc_conv_array_data (se->expr);
Index: gcc/testsuite/gfortran.dg/vect/O3-pr49957.f
===================================================================
--- gcc/testsuite/gfortran.dg/vect/O3-pr49957.f	(revision 0)
+++ gcc/testsuite/gfortran.dg/vect/O3-pr49957.f	(revision 0)
@@ -0,0 +1,16 @@ 
+! { dg-do compile }
+! { dg-require-effective-target vect_double }
+      subroutine shell(nx,ny,nz,q,dq)
+      implicit none
+      integer i,j,k,l,nx,ny,nz
+      real*8 q(5,nx,ny),dq(5,nx,ny)
+         do j=1,ny
+            do i=1,nx
+               do l=1,5
+                  q(l,i,j)=q(l,i,j)+dq(l,i,j)
+               enddo
+            enddo
+         enddo
+      return
+      end
+! { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail vect_no_align } } }