From patchwork Wed Jan 31 15:29:57 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cesar Philippidis X-Patchwork-Id: 867991 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-472382-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="FNPANYdX"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zWnJM4TdRz9ryv for ; Thu, 1 Feb 2018 02:30:41 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :subject:to:message-id:date:mime-version:content-type; q=dns; s= default; b=KFUZd/JnZqITYxYgaJK3wiSEkUvjkqPBm1AqWImoE96GxNsrA0vm4 SAM37iRN/rvgA2igyjzYBnv0Eqm8ywWeh8hXzQ9VeFaoGWQhhb5Mjl0k0zXyjyIe HEaXZUmxIUmT4PJeNHyv0Yj8hXoWoA8bf6bBR+/PYq0nwXcZiD03ZI= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :subject:to:message-id:date:mime-version:content-type; s= default; bh=SBfJoM8+BhWh8teTvqSWjqu7AIc=; b=FNPANYdXGVi8fsT//neJ sIoRosMnoaMA36cFUYjRyDTSrckL9sIjlJ/5DbF16+x+NqI7O4FeXL/BAqOJ/ftP wCvMBmEjb7mwDvdVM3t6Qn3B16McjbsoQHPX92cRza1HINPNmsDYERJ/xp4eLrIi 9ftx4SeDDYVfNvcT7W9aYfo= Received: (qmail 42051 invoked by alias); 31 Jan 2018 15:30:31 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 41893 invoked by uid 89); 31 Jan 2018 15:30:20 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-25.0 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_PASS, URIBL_RED autolearn=unavailable version=3.3.2 spammy=gang, 4000 X-HELO: relay1.mentorg.com Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 31 Jan 2018 15:30:03 +0000 Received: from svr-orw-mbx-04.mgc.mentorg.com ([147.34.90.204]) by relay1.mentorg.com with esmtps (TLSv1.2:ECDHE-RSA-AES256-SHA384:256) id 1eguKn-0001d6-Sh from Cesar_Philippidis@mentor.com for gcc-patches@gcc.gnu.org; Wed, 31 Jan 2018 07:30:01 -0800 Received: from [127.0.0.1] (147.34.91.1) by SVR-ORW-MBX-04.mgc.mentorg.com (147.34.90.204) with Microsoft SMTP Server (TLS) id 15.0.1320.4; Wed, 31 Jan 2018 07:29:59 -0800 From: Cesar Philippidis Subject: [og7] Enable firstprivate OpenACC reductions To: "gcc-patches@gcc.gnu.org" , Chung-Lin Tang Message-ID: <93786c8e-047c-e526-4913-d7f360dd1fe0@codesourcery.com> Date: Wed, 31 Jan 2018 07:29:57 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 X-ClientProxiedBy: svr-orw-mbx-02.mgc.mentorg.com (147.34.90.202) To SVR-ORW-MBX-04.mgc.mentorg.com (147.34.90.204) I've applied this patch to openacc-gcc-7-branch which teaches the gimplifier how to pass certain OpenACC reduction variables as firstprivate, and not with an implicit copy directive. This is matches the default behavior for the implicit data mappings of scalar variables inside OpenACC parallel regions. It should be noted that the gimplifier will still implicitly map reduction variables on loops immediately enclosed inside a parallel regions, like so #pragma acc parallel #pragma acc loop reduction(+:sum) as copy. This change only impacts reductions variables inside nested acc loops like #pragma acc parallel #pragma acc loop for (...) { #pragma acc loop reduction(+:s2) Here s2 will be transferred into the accelerator as firstprivate instead of copy. Chung-Lin, I think one of us should go back and clean up the reduction variable logic inside gimplify.c:omp_add_variable. I know that it's been a while since you last worked on this. Let me know if you have any state on that code, otherwise I'll handle the cleanup. Cesar Enable firstprivate OpenACC reductions 2018-01-31 Cesar Philippidis gcc/ * gimplify.c (omp_add_variable): Allow certain OpenACC reduction variables to remain firstprivate. gcc/testsuite/ * c-c++-common/goacc/reduction-8.c: New test. diff --git a/gcc/gimplify.c b/gcc/gimplify.c index 72ed8f1a249..44c03ab8310 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -6740,9 +6740,16 @@ omp_add_variable (struct gimplify_omp_ctx *ctx, tree decl, unsigned int flags) else splay_tree_insert (ctx->variables, (splay_tree_key)decl, flags); - /* For reductions clauses in OpenACC loop directives, by default create a - copy clause on the enclosing parallel construct for carrying back the - results. */ + /* For OpenACC loop directives, when a reduction is immediately + enclosed within an acc parallel or kernels construct, it must + have an implied copy data mapping. E.g. + + #pragma acc parallel + { + #pragma acc loop reduction (+:sum) + + a copy clause for sum should be added on the enclosing parallel + construct for carrying back the results. */ if (ctx->region_type == ORT_ACC && (flags & GOVD_REDUCTION)) { struct gimplify_omp_ctx *outer_ctx = ctx->outer_context; @@ -6758,8 +6765,11 @@ omp_add_variable (struct gimplify_omp_ctx *ctx, tree decl, unsigned int flags) vector = true; } - /* Set new copy map as 'private' if sure we're not gang-partitioning. */ - bool map_private; + /* Reduction data maps need to be marked as private for worker + and vector loops, in order to ensure that value of the + reduction carried back to the host. Set new copy map as + 'private' if sure we're not gang-partitioning. */ + bool map_private, update_data_map = false; if (gang) map_private = false; @@ -6768,6 +6778,10 @@ omp_add_variable (struct gimplify_omp_ctx *ctx, tree decl, unsigned int flags) else map_private = oacc_privatize_reduction (ctx->outer_context); + if (ctx->outer_context + && ctx->outer_context->region_type == ORT_ACC_PARALLEL) + update_data_map = true; + while (outer_ctx) { n = splay_tree_lookup (outer_ctx->variables, (splay_tree_key)decl); @@ -6784,7 +6798,8 @@ omp_add_variable (struct gimplify_omp_ctx *ctx, tree decl, unsigned int flags) gcc_assert (!(n->value & GOVD_FIRSTPRIVATE) && (n->value & GOVD_MAP)); } - else if (outer_ctx->region_type == ORT_ACC_PARALLEL) + else if (update_data_map + && outer_ctx->region_type == ORT_ACC_PARALLEL) { /* Remove firstprivate and make it a copy map. */ n->value &= ~GOVD_FIRSTPRIVATE; @@ -6796,7 +6811,8 @@ omp_add_variable (struct gimplify_omp_ctx *ctx, tree decl, unsigned int flags) n->value |= GOVD_MAP_PRIVATE; } } - else if (outer_ctx->region_type == ORT_ACC_PARALLEL) + else if (update_data_map + && outer_ctx->region_type == ORT_ACC_PARALLEL) { unsigned f = GOVD_MAP | GOVD_SEEN; diff --git a/gcc/testsuite/c-c++-common/goacc/reduction-8.c b/gcc/testsuite/c-c++-common/goacc/reduction-8.c new file mode 100644 index 00000000000..8a0283f4ac3 --- /dev/null +++ b/gcc/testsuite/c-c++-common/goacc/reduction-8.c @@ -0,0 +1,94 @@ +/* { dg-additional-options "-fdump-tree-gimple" } */ + +#define n 1000 + +int +main(void) +{ + int i, j; + int result, array[n]; + +#pragma acc parallel loop reduction (+:result) + for (i = 0; i < n; i++) + result ++; + +#pragma acc parallel +#pragma acc loop reduction (+:result) + for (i = 0; i < n; i++) + result ++; + +#pragma acc parallel +#pragma acc loop + for (i = 0; i < n; i++) + { + result = i; + +#pragma acc loop reduction(+:result) + for (j = 0; j < n; j++) + result ++; + + array[i] = result; + } + +#pragma acc parallel +#pragma acc loop + for (i = 0; i < n; i++) + { + result = i; + +#pragma acc loop worker vector reduction(+:result) + for (j = 0; j < n; j++) + result ++; + + array[i] = result; + } + +#pragma acc parallel +#pragma acc loop // { dg-warning "insufficient partitioning" } + for (i = 0; i < n; i++) + { + result = i; + +#pragma acc loop gang reduction(+:result) + for (j = 0; j < n; j++) + result ++; + + array[i] = result; + } + +#pragma acc parallel copy(result) +#pragma acc loop // { dg-warning "insufficient partitioning" } + for (i = 0; i < n; i++) + { + result = i; + +#pragma acc loop gang reduction(+:result) + for (j = 0; j < n; j++) + result ++; + + array[i] = result; + } + +#pragma acc kernels +#pragma acc loop + for (i = 0; i < n; i++) + { + result = i; + +#pragma acc loop reduction(+:result) + for (j = 0; j < n; j++) + result ++; + + array[i] = result; + } + + return 0; +} + +/* Check that default copy maps are generated for loop reductions. */ +/* { dg-final { scan-tree-dump-times "reduction..:result. map.tofrom:result .len: 4.." 1 "gimple" } } */ +/* { dg-final { scan-tree-dump-times "oacc_parallel map.tofrom:result .len: 4.." 2 "gimple" } } */ +/* { dg-final { scan-tree-dump-times "map.tofrom:array .len: 4000.. firstprivate.result." 3 "gimple" } } */ +/* { dg-final { scan-tree-dump-times "map.tofrom:result .len: 4.. map.tofrom:array .len: 4000.." 1 "gimple" } } */ +/* { dg-final { scan-tree-dump-times "map.tofrom:array .len: 4000.. map.force_tofrom:result .len: 4.." 1 "gimple" } } */ +