From patchwork Tue Apr 10 16:00:04 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cesar Philippidis X-Patchwork-Id: 896789 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-476153-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=mentor.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="UPAt/qc3"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 40LBjj4nVyz9s1P for ; Wed, 11 Apr 2018 02:01:12 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :subject:to:message-id:date:mime-version:content-type; q=dns; s= default; b=fE3Vh0+qYAFWYMAlRDjFoAkZMboZVZNKVEDmvgTobPaxUEekifPYu C2eVW9iVwnn3kaBoZCgiB2PUASLQvZEl3TkrLR75v1u6CoeJ8dVyMQxXEhfPrGUf gt5Yostbt98bsWBhwS6DdFXJWvbd2w2sN0JNdzJHdPw0Pv/rV16BTE= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :subject:to:message-id:date:mime-version:content-type; s= default; bh=e266s8rRCYceLosRagdodjM7QjY=; b=UPAt/qc36ET7rnyU1vWz /j3xSEGpTE7ECYnYCyza6shkPPQEE+M4xiBB4JmHzXB4GgwEQlA8Rqd9y2uRLiVp 6GlRnLfNqEqJXoBsoy/0eaeoP1ZtbLVbymRBaLKKhisuZWVp7kVcNv8NVG6OHH4q kXAOwfVhyid66krCZa5tgfU= Received: (qmail 23288 invoked by alias); 10 Apr 2018 16:01:03 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 106054 invoked by uid 89); 10 Apr 2018 16:00:42 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_PASS, TIME_LIMIT_EXCEEDED, URIBL_RED autolearn=unavailable version=3.3.2 spammy=1120, warp, Detected, Hx-languages-length:5464 X-HELO: relay1.mentorg.com Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 10 Apr 2018 16:00:27 +0000 Received: from svr-orw-mbx-04.mgc.mentorg.com ([147.34.90.204]) by relay1.mentorg.com with esmtps (TLSv1.2:ECDHE-RSA-AES256-SHA384:256) id 1f5vgo-0007Zw-T7 from Cesar_Philippidis@mentor.com for gcc-patches@gcc.gnu.org; Tue, 10 Apr 2018 09:00:10 -0700 Received: from [127.0.0.1] (147.34.91.1) by SVR-ORW-MBX-04.mgc.mentorg.com (147.34.90.204) with Microsoft SMTP Server (TLS) id 15.0.1320.4; Tue, 10 Apr 2018 09:00:07 -0700 From: Cesar Philippidis Subject: [og7] Enable worker partitioning with warp-sized vector_length To: "gcc-patches@gcc.gnu.org" Message-ID: <10f9eadb-43e8-46ac-9730-50b8db484459@mentor.com> Date: Tue, 10 Apr 2018 09:00:04 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 X-ClientProxiedBy: svr-orw-mbx-04.mgc.mentorg.com (147.34.90.204) To SVR-ORW-MBX-04.mgc.mentorg.com (147.34.90.204) At present, due bugs in the nvptx state propagation code, worker partitioning must be deactivated when the user specifies a large vector_length. However, the code that is responsible for deactivating worker partitioning was not considering the case where the vector_length is the size of a warp. Consequently, not all of the CUDA threads were being utilized when vector_length = 32 (which is the default case). I've committed this patch to openacc-gcc-7-branch which allows warp-sized vectors to nest inside worker-partitioned loops. Cesar 2018-04-10 Cesar Philippidis gcc/ * config/nvptx/nvptx.c (populate_offload_attrs): Ensure that oa->vector_length is set. gcc/testsuite/ * gcc.target/nvptx/oacc-autopar.c: New test. diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c index b5e6dce165c..cd89d1738aa 100644 --- a/gcc/config/nvptx/nvptx.c +++ b/gcc/config/nvptx/nvptx.c @@ -4639,7 +4639,7 @@ populate_offload_attrs (offload_attrs *oa) if (!lookup_attribute ("omp target entrypoint", DECL_ATTRIBUTES (current_function_decl))) oa->vector_length = PTX_WARP_SIZE; - else if (PTX_VECTOR_LENGTH != PTX_WARP_SIZE) + else oa->vector_length = PTX_VECTOR_LENGTH; } if (oa->num_workers == 0) diff --git a/gcc/testsuite/gcc.target/nvptx/oacc-autopar.c b/gcc/testsuite/gcc.target/nvptx/oacc-autopar.c new file mode 100644 index 00000000000..915053afcbf --- /dev/null +++ b/gcc/testsuite/gcc.target/nvptx/oacc-autopar.c @@ -0,0 +1,120 @@ +/* Verify the default assignment of parallelism. */ + +/* { dg-do compile } */ +/* { dg-additional-options "-fopenacc -misa=sm_35 -fopt-info-note-omp" } */ + +void +gang_independent () +{ + int i, j; + + #pragma acc parallel + #pragma acc loop gang /* { dg-message "Detected parallelism " } */ + for (i = 0; i < 10; i++) + #pragma acc loop /* { dg-message "Detected parallelism " } */ + for (j = 0; j < 10; j++) + ; + + #pragma acc parallel vector_length (128) + #pragma acc loop gang /* { dg-message "Detected parallelism " } */ + for (i = 0; i < 10; i++) + #pragma acc loop /* { dg-message "Detected parallelism " } */ + for (j = 0; j < 10; j++) + ; +} + +void +gang_independent_seq () +{ + int i, j, k; + + #pragma acc parallel + #pragma acc loop gang /* { dg-message "Detected parallelism " } */ + for (i = 0; i < 10; i++) + #pragma acc loop /* { dg-message "Detected parallelism " } */ + for (j = 0; j < 10; j++) + #pragma acc loop seq /* { dg-message "Detected parallelism " } */ + for (k = 0; k < 10; k++) + ; + + #pragma acc parallel vector_length (128) + #pragma acc loop gang /* { dg-message "Detected parallelism " } */ + for (i = 0; i < 10; i++) + #pragma acc loop /* { dg-message "Detected parallelism " } */ + for (j = 0; j < 10; j++) + #pragma acc loop seq /* { dg-message "Detected parallelism " } */ + for (k = 0; k < 10; k++) + ; +} + +void +worker () +{ + int i, j; + + #pragma acc parallel + #pragma acc loop worker /* { dg-message "Detected parallelism " } */ + for (i = 0; i < 10; i++) + #pragma acc loop /* { dg-message "Detected parallelism " } */ + for (j = 0; j < 10; j++) + ; + + #pragma acc parallel vector_length (128) + #pragma acc loop worker /* { dg-message "Detected parallelism " } */ + for (i = 0; i < 10; i++) + #pragma acc loop /* { dg-message "Detected parallelism " } */ + for (j = 0; j < 10; j++) + ; +} + +void +fully_independent () +{ + int i, j, k; + + #pragma acc parallel + #pragma acc loop /* { dg-message "Detected parallelism " } */ + for (i = 0; i < 10; i++) + ; + + #pragma acc parallel vector_length (128) + #pragma acc loop /* { dg-message "Detected parallelism " } */ + for (i = 0; i < 10; i++) + ; + + #pragma acc parallel + #pragma acc loop /* { dg-message "Detected parallelism " } */ + for (i = 0; i < 10; i++) + #pragma acc loop /* { dg-message "Detected parallelism " } */ + for (j = 0; j < 10; j++) + ; + + /* FIXME: Should the outer loop only be gang partitioned so that the + inner loopp can utilize a large vector_length? */ + #pragma acc parallel vector_length (128) + #pragma acc loop /* { dg-message "Detected parallelism " } */ + for (i = 0; i < 10; i++) + #pragma acc loop /* { dg-message "Detected parallelism " } */ + for (j = 0; j < 10; j++) + ; + + #pragma acc parallel + #pragma acc loop /* { dg-message "Detected parallelism " } */ + for (i = 0; i < 10; i++) + #pragma acc loop /* { dg-message "Detected parallelism " } */ + for (j = 0; j < 10; j++) + #pragma acc loop /* { dg-message "Detected parallelism " } */ + for (k = 0; k < 10; k++) + ; + + /* FIXME: Should the middle loop be seq-partitioned in order to + respect vector_length = 128 on the innermost loop? */ + #pragma acc parallel vector_length (128) + #pragma acc loop /* { dg-message "Detected parallelism " } */ + for (i = 0; i < 10; i++) + #pragma acc loop /* { dg-message "Detected parallelism " } */ + for (j = 0; j < 10; j++) + #pragma acc loop /* { dg-message "Detected parallelism " } */ + for (k = 0; k < 10; k++) + ; +}