From patchwork Wed Aug 3 15:16:38 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 108270 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 51909B71E2 for ; Thu, 4 Aug 2011 01:17:01 +1000 (EST) Received: (qmail 31187 invoked by alias); 3 Aug 2011 15:16:57 -0000 Received: (qmail 31159 invoked by uid 22791); 3 Aug 2011 15:16:55 -0000 X-SWARE-Spam-Status: No, hits=-3.7 required=5.0 tests=AWL, BAYES_00, RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from cantor2.suse.de (HELO mx2.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 03 Aug 2011 15:16:40 +0000 Received: from relay2.suse.de (charybdis-ext.suse.de [195.135.221.2]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx2.suse.de (Postfix) with ESMTP id 102898B013; Wed, 3 Aug 2011 17:16:39 +0200 (CEST) Date: Wed, 3 Aug 2011 17:16:38 +0200 (CEST) From: Richard Guenther To: gcc-patches@gcc.gnu.org Cc: fortran@gcc.gnu.org Subject: Re: [PATCH][RFC] Fix PR49957 - build array index differently In-Reply-To: Message-ID: References: User-Agent: Alpine 2.00 (LNX 1167 2008-08-23) MIME-Version: 1.0 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org On Wed, 3 Aug 2011, Richard Guenther wrote: > > This fixes PR49957 by keeping the array index into a multi-dimensional > array in optimal associated form which is ((off + outermost) + ...) + > innermost) + constant) so that dependence analysis can properly > handle it. It doesn't work right now because we build the > expression in reverse order, fold thinks it can do some fancy and > the expression is of signed type and thus we know it doesn't overflow > but we also won't re-associate it to a more optimal form. > > I tried reversing the loop in gfc_conv_array_ref but that doesn't > work (for example aliasing_dummy_4.f90 ICEs), thus the funny way > of chaining the pluses. > > I also don't know if there is maybe another place we build similar > expressions that should be adjusted, too - this one is where > we build the expression for the testcase I looked at. > > The patch doesn't make 410.bwaves measurably faster, but at least > it also doesn't get slower. > > Bootstrap and regtest is currently running on x86_64-unknown-linux-gnu, > the reversed loop one was ok (well, apart from those 2-3 fails). > > Comments? Any idea why reversing the loop would break? > The ICE I got is > > /space/rguenther/src/svn/trunk/gcc/testsuite/gfortran.dg/aliasing_dummy_4.f90: > In function 'test_f90': > /space/rguenther/src/svn/trunk/gcc/testsuite/gfortran.dg/aliasing_dummy_4.f90:21:0: > internal compiler error: in gfc_conv_constant, at > fortran/trans-const.c:387 > Please submit a full bug report, > with preprocessed source if appropriate. > See for instructions. Tobias noticed the testcase is optimized away, fixed. Also fixed is another oversight for another loop in 410.bwaves, with constant first index. Re-bootstrapping currently. The previous patch tested ok. Richard. 2011-08-03 Richard Guenther PR fortran/49957 * trans-array.c (gfc_conv_array_ref): Build the array index expression in optimally associated order. * gfortran.dg/vect/O3-pr49957.f: New testcase. Index: gcc/fortran/trans-array.c =================================================================== --- gcc/fortran/trans-array.c (revision 177094) +++ gcc/fortran/trans-array.c (working copy) @@ -2634,7 +2634,7 @@ gfc_conv_array_ref (gfc_se * se, gfc_arr locus * where) { int n; - tree index; + tree index, offset, *indexp; tree tmp; tree stride; gfc_se indexse; @@ -2669,9 +2669,16 @@ gfc_conv_array_ref (gfc_se * se, gfc_arr return; } - index = gfc_index_zero_node; + offset = gfc_conv_array_offset (se->expr); + if (TREE_CODE (offset) != INTEGER_CST) + index = offset; + else + index = gfc_index_zero_node; - /* Calculate the offsets from all the dimensions. */ + indexp = &index; + + /* Calculate the offsets from all the dimensions. Make sure to associate + the final offset so that we form a chain of loop invariant summands. */ for (n = 0; n < ar->dimen; n++) { /* Calculate the index for this dimension. */ @@ -2740,15 +2747,43 @@ gfc_conv_array_ref (gfc_se * se, gfc_arr tmp = fold_build2_loc (input_location, MULT_EXPR, gfc_array_index_type, indexse.expr, stride); - /* And add it to the total. */ - index = fold_build2_loc (input_location, PLUS_EXPR, - gfc_array_index_type, index, tmp); + /* And add it to the total. Avoid folding as that re-associates + in a non-optimal way. We want to have constant offsets as + the outermost addition and the rest of the additions in order + of the loop depth. */ + if (!integer_zerop (index)) + { + if (TREE_CODE (tmp) == INTEGER_CST) + { + bool reset = indexp == &index; + index = fold_build2_loc (input_location, PLUS_EXPR, + gfc_array_index_type, index, tmp); + if (reset) + { + if (TREE_CODE (index) == PLUS_EXPR) + indexp = &TREE_OPERAND (index, 0); + else + indexp = &index; + } + } + else + { + *indexp + = build2_loc (input_location, PLUS_EXPR, + gfc_array_index_type, *indexp, tmp); + indexp = &TREE_OPERAND (*indexp, 0); + } + } + else + { + index = tmp; + indexp = &index; + } } - tmp = gfc_conv_array_offset (se->expr); - if (!integer_zerop (tmp)) + if (TREE_CODE (offset) == INTEGER_CST) index = fold_build2_loc (input_location, PLUS_EXPR, - gfc_array_index_type, index, tmp); + gfc_array_index_type, index, offset); /* Access the calculated element. */ tmp = gfc_conv_array_data (se->expr); Index: gcc/testsuite/gfortran.dg/vect/O3-pr49957.f =================================================================== --- gcc/testsuite/gfortran.dg/vect/O3-pr49957.f (revision 0) +++ gcc/testsuite/gfortran.dg/vect/O3-pr49957.f (revision 0) @@ -0,0 +1,16 @@ +! { dg-do compile } +! { dg-require-effective-target vect_double } + subroutine shell(nx,ny,nz,q,dq) + implicit none + integer i,j,k,l,nx,ny,nz + real*8 q(5,nx,ny),dq(5,nx,ny) + do j=1,ny + do i=1,nx + do l=1,5 + q(l,i,j)=q(l,i,j)+dq(l,i,j) + enddo + enddo + enddo + return + end +! { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail vect_no_align } } }