From patchwork Mon Nov 24 11:31:43 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom de Vries X-Patchwork-Id: 413623 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id E70B214012F for ; Mon, 24 Nov 2014 22:32:00 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:references :in-reply-to:content-type; q=dns; s=default; b=ojRmSDU+VC67Fz4xF fLhRej4TmLpOXiXIih6QnXC43RFZ+tKW3SkbNSZAPwowLnCG0t/iIjVjkSi9OjVc J8wrccBog9941sZhu4OI3LdRxF40DtMouhB1qnRqd+h/e7boTPBGluD5WhxQd3ow InVzsOUU99YhO91VmxRxbAGT70= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:references :in-reply-to:content-type; s=default; bh=kdnLWODLUqZkEXyIEBapRPZ Z8eI=; b=Bt2JAPOf3oCJFEqVuggyEU7oZtfka9apJC6oVlc3nuMz78OYdz066IX /sVvgZ+xNOiNS5TIlQ6UzJq1Vdc55kjLhW1BqAfPh+NSvPWB+5jUkpSX8D7OOPyl +pnieFVGlyI/ELUkb4vr3w4GS7Tr8Fx1Lx7a5VpQI/cF6J5R6CBY= Received: (qmail 27064 invoked by alias); 24 Nov 2014 11:31:53 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 27048 invoked by uid 89); 24 Nov 2014 11:31:52 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.2 X-HELO: relay1.mentorg.com Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 24 Nov 2014 11:31:51 +0000 Received: from nat-ies.mentorg.com ([192.94.31.2] helo=SVR-IES-FEM-02.mgc.mentorg.com) by relay1.mentorg.com with esmtp id 1Xsrrw-0000Pl-6N from Tom_deVries@mentor.com ; Mon, 24 Nov 2014 03:31:48 -0800 Received: from [127.0.0.1] (137.202.0.76) by SVR-IES-FEM-02.mgc.mentorg.com (137.202.0.106) with Microsoft SMTP Server id 14.3.181.6; Mon, 24 Nov 2014 11:31:46 +0000 Message-ID: <5473171F.2040708@mentor.com> Date: Mon, 24 Nov 2014 12:31:43 +0100 From: Tom de Vries User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Richard Biener CC: GCC Patches , Jakub Jelinek , Thomas Schwinge , Subject: Re: [PATCH, 8/8] Do simple omp lowering for no address taken var References: <546743BC.5070804@mentor.com> <54678C29.40006@mentor.com> <54731678.9090207@mentor.com> In-Reply-To: <54731678.9090207@mentor.com> On 24-11-14 12:28, Tom de Vries wrote: > On 17-11-14 11:13, Richard Biener wrote: >> On Sat, 15 Nov 2014, Tom de Vries wrote: >> >>> >On 15-11-14 13:14, Tom de Vries wrote: >>>> > >Hi, >>>> > > >>>> > >I'm submitting a patch series with initial support for the oacc kernels >>>> > >directive. >>>> > > >>>> > >The patch series uses pass_parallelize_loops to implement parallelization of >>>> > >loops in the oacc kernels region. >>>> > > >>>> > >The patch series consists of these 8 patches: >>>> > >... >>>> > > 1 Expand oacc kernels after pass_build_ealias >>>> > > 2 Add pass_oacc_kernels >>>> > > 3 Add pass_ch_oacc_kernels to pass_oacc_kernels >>>> > > 4 Add pass_tree_loop_{init,done} to pass_oacc_kernels >>>> > > 5 Add pass_loop_im to pass_oacc_kernels >>>> > > 6 Add pass_ccp to pass_oacc_kernels >>>> > > 7 Add pass_parloops_oacc_kernels to pass_oacc_kernels >>>> > > 8 Do simple omp lowering for no address taken var >>>> > >... >>> > >>> >This patch lowers integer variables that do not have their address taken as >>> >local variable. We use a copy at region entry and exit to copy the value in >>> >and out. >>> > >>> >In the context of reduction handling in a kernels region, this allows the >>> >parloops reduction analysis to recognize the reduction, even after oacc >>> >lowering has been done in pass_lower_omp. >>> > >>> >In more detail, without this patch, the omp_data_i load and stores are >>> >generated in place (in this case, in the loop): >>> >... >>> > { >>> > .omp_data_iD.2201 = &.omp_data_arr.15D.2220; >>> > { >>> > unsigned intD.9 iD.2146; >>> > >>> > iD.2146 = 0; >>> > goto ; >>> > : >>> > D.2216 = .omp_data_iD.2201->cD.2203; >>> > c.9D.2176 = *D.2216; >>> > D.2177 = (long unsigned intD.10) iD.2146; >>> > D.2178 = D.2177 * 4; >>> > D.2179 = c.9D.2176 + D.2178; >>> > D.2180 = *D.2179; >>> > D.2217 = .omp_data_iD.2201->sumD.2205; >>> > D.2218 = *D.2217; >>> > D.2217 = .omp_data_iD.2201->sumD.2205; >>> > D.2219 = D.2180 + D.2218; >>> > *D.2217 = D.2219; >>> > iD.2146 = iD.2146 + 1; >>> > : >>> > if (iD.2146 <= 524287) goto ; else goto ; >>> > : >>> > } >>> >... >>> > >>> >With this patch, the omp_data_i load and stores for sum are generated at entry >>> >and exit: >>> >... >>> > { >>> > .omp_data_iD.2201 = &.omp_data_arr.15D.2218; >>> > D.2216 = .omp_data_iD.2201->sumD.2205; >>> > sumD.2206 = *D.2216; >>> > { >>> > unsigned intD.9 iD.2146; >>> > >>> > iD.2146 = 0; >>> > goto ; >>> > : >>> > D.2217 = .omp_data_iD.2201->cD.2203; >>> > c.9D.2176 = *D.2217; >>> > D.2177 = (long unsigned intD.10) iD.2146; >>> > D.2178 = D.2177 * 4; >>> > D.2179 = c.9D.2176 + D.2178; >>> > D.2180 = *D.2179; >>> > sumD.2206 = D.2180 + sumD.2206; >>> > iD.2146 = iD.2146 + 1; >>> > : >>> > if (iD.2146 <= 524287) goto ; else goto ; >>> > : >>> > } >>> > *D.2216 = sumD.2206; >>> > #pragma omp return >>> > } >>> >... >>> > >>> > >>> >So, without the patch the reduction operation looks like this: >>> >... >>> > *(.omp_data_iD.2201->sumD.2205) = *(.omp_data_iD.2201->sumD.2205) + x >>> >... >>> > >>> >And with this patch the reduction operation is simply: >>> >... >>> > sumD.2206 = sumD.2206 + x: >>> >... >>> > >>> >OK for trunk? >> I presume the reason you are trying to do that here is that otherwise >> it happens too late? What you do is what loop store motion would >> do. > > Richard, > > Thanks for the hint. I've built a reduction example: > ... > void __attribute__((noinline)) > f (unsigned int *__restrict__ a, unsigned int *__restrict__ sum, unsigned int n) > { > unsigned int i; > for (i = 0; i < n; ++i) > *sum += a[i]; > }... > and observed that store motion of the *sum store is done by pass_loop_im, > provided the *sum load is taken out of the the loop by pass_pre first. > > So alternatively, we could use pass_pre and pass_loop_im to achieve the same > effect. > > When trying out adding pass_pre as a part of the pass group pass_oacc_kernels, I > found that also pass_copyprop was required to get parloops to recognize the > reduction. > Attached patch adds pass_copyprop to pass group pass_oacc_kernels. Bootstrapped and reg-tested in the same way as before. OK for trunk? Thanks, - Tom 2014-11-23 Tom de Vries * passes.def: Add pass_copy_prop to pass group pass_oacc_kernels. * tree-ssa-copy.c (stmt_may_generate_copy): Handle .omp_data_i init conservatively. --- gcc/passes.def | 1 + gcc/tree-ssa-copy.c | 4 ++++ 2 files changed, 5 insertions(+) diff --git a/gcc/passes.def b/gcc/passes.def index 3a7b096..8c663b0 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -95,6 +95,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_tree_loop_init); NEXT_PASS (pass_lim); NEXT_PASS (pass_ccp); + NEXT_PASS (pass_copy_prop); NEXT_PASS (pass_tree_loop_done); POP_INSERT_PASSES () NEXT_PASS (pass_expand_omp_ssa); diff --git a/gcc/tree-ssa-copy.c b/gcc/tree-ssa-copy.c index 7c22c5e..d6eb7a7 100644 --- a/gcc/tree-ssa-copy.c +++ b/gcc/tree-ssa-copy.c @@ -55,6 +55,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-scalar-evolution.h" #include "tree-ssa-dom.h" #include "tree-ssa-loop-niter.h" +#include "omp-low.h" /* This file implements the copy propagation pass and provides a @@ -110,6 +111,9 @@ stmt_may_generate_copy (gimple stmt) if (gimple_has_volatile_ops (stmt)) return false; + if (gimple_stmt_omp_data_i_init_p (stmt)) + return false; + /* Statements with loads and/or stores will never generate a useful copy. */ if (gimple_vuse (stmt)) return false; -- 1.9.1