From patchwork Wed Sep 13 23:20:32 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cesar Philippidis X-Patchwork-Id: 813667 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-462094-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="elwk1caq"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3xsyMN4gbYz9t2m for ; Thu, 14 Sep 2017 09:20:47 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :subject:to:message-id:date:mime-version:content-type; q=dns; s= default; b=PtjmmhYLSwQjTjcKO8W0QGhYpIPtFaOw5Sq5l9wwj8IuqVjEfP6X+ tdYbSXQNJlLuZM2l7UUA3dWcmmh6Dp0hG9NfVtFc2r2Ny20wcyvfm2JF3Cy18dEk j+G23ThIXv/2cc+9OEk0TlhZgFy2KiLk4KDFimUsHal8hE3uG3IU/k= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :subject:to:message-id:date:mime-version:content-type; s= default; bh=WsWB+KieoQ9hCiAFpkaS+eBXEw8=; b=elwk1caqAamJIKRXfBgp 6Fzr7IAIC7XDOB8lCN7Qg2o5EbSQpA+cQxl+ImLR1No1UapPJlaPE1Bf0BPMyD0A NvapygHBIgKylUBbcqafo31rx8Mi1gXE3auYL3kVSo4Nm5dOsrnbIqtiN3pO6WJ+ WI7voTphupwvpxZddNhjwuM= Received: (qmail 65228 invoked by alias); 13 Sep 2017 23:20:40 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 63635 invoked by uid 89); 13 Sep 2017 23:20:39 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-24.5 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_PASS, URIBL_RED autolearn=ham version=3.3.2 spammy=Cesar, cesar, held X-HELO: relay1.mentorg.com Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 13 Sep 2017 23:20:37 +0000 Received: from svr-orw-mbx-04.mgc.mentorg.com ([147.34.90.204]) by relay1.mentorg.com with esmtps (TLSv1.2:ECDHE-RSA-AES256-SHA384:256) id 1dsGxO-0006qB-SY from Cesar_Philippidis@mentor.com for gcc-patches@gcc.gnu.org; Wed, 13 Sep 2017 16:20:34 -0700 Received: from [127.0.0.1] (147.34.91.1) by SVR-ORW-MBX-04.mgc.mentorg.com (147.34.90.204) with Microsoft SMTP Server (TLS) id 15.0.1263.5; Wed, 13 Sep 2017 16:20:32 -0700 From: Cesar Philippidis Subject: [OpenACC] Enable SIMD vectorization on vector loops To: "gcc-patches@gcc.gnu.org" Message-ID: <72c9e096-70fc-bc55-200b-2b51eb5c2e0b@mentor.com> Date: Wed, 13 Sep 2017 16:20:32 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 X-ClientProxiedBy: svr-orw-mbx-08.mgc.mentorg.com (147.34.90.208) To SVR-ORW-MBX-04.mgc.mentorg.com (147.34.90.204) This patch enables SIMD vectorization on non-SIMT targets in acc vector loops. It does does so by setting the force_vectorization flag in a similar manner to OpenMP SIMD loops. Unlike OpenMP, OpenACC provides the compiler with the flexibility to assign gang, worker and vector parallelism to independent acc loops. At present, automatic parallelism is assigned during the oacc device lower pass, specifically inside oacc_loop_process. Consequently, this patch applies the force_vectorization flag late when the GOACC_LOOP internal functions are expanded into target-specific code. Note that expand_oacc_for may construct two loops for each acc loop; the outer loop represents the "chunking" factor, whereas the inner loops are for individual gang, worker and vector threads. Also note that OpenACC permits the user to apply any combination of gang, worker and vector level parallelism to each loop. E.g., acc loop gang vector. However, oacc_xform_loop does not strip-mine the acc loops to take advantage of this on non-SIMT targets as it does for SIMT targets. Therefore, this the force vectorization flag is only set when the acc loop has been assigned vector partitioning. Is this patch OK for trunk? Cesar 2017-09-13 Cesar Philippidis gcc/ * omp-offload.c (oacc_xform_loop): Enable SIMD vectorization on non-SIMT targets in acc vector loops. diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c index 2d4fd411680..9d5b8bef649 100644 --- a/gcc/omp-offload.c +++ b/gcc/omp-offload.c @@ -51,6 +51,7 @@ along with GCC; see the file COPYING3. If not see #include "intl.h" #include "stringpool.h" #include "attribs.h" +#include "cfgloop.h" /* Describe the OpenACC looping structure of a function. The entire function is held in a 'NULL' loop. */ @@ -370,6 +371,30 @@ oacc_xform_loop (gcall *call) break; case IFN_GOACC_LOOP_OFFSET: + /* Enable vectorization on non-SIMT targets. */ + if (!targetm.simt.vf + && outer_mask == GOMP_DIM_MASK (GOMP_DIM_VECTOR) + /* If not -fno-tree-loop-vectorize, hint that we want to vectorize + the loop. */ + && (flag_tree_loop_vectorize + || !global_options_set.x_flag_tree_loop_vectorize)) + { + basic_block bb = gsi_bb (gsi); + struct loop *parent = bb->loop_father; + struct loop *body = parent->inner; + + parent->force_vectorize = true; + parent->safelen = INT_MAX; + + /* "Chunking loops" may have inner loops. */ + if (parent->inner) + { + body->force_vectorize = true; + body->safelen = INT_MAX; + } + + cfun->has_force_vectorize_loops = true; + } if (striding) { r = oacc_thread_numbers (true, mask, &seq);