From patchwork Wed Sep 5 20:05:41 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cesar Philippidis X-Patchwork-Id: 966655 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-485266-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=mentor.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="m28a5sq6"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 425F7s2r1mz9sCh for ; Thu, 6 Sep 2018 06:05:59 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :subject:to:message-id:date:mime-version:content-type; q=dns; s= default; b=UpRg9gxWKuUscEN4wrAYNhyrXzGgIl/zFIaS/dISJpD9GusX4RaMD uLw7OZ2wD1/0gmGYaDnBaxIjVKsYkZSGtKf+BZ0LR/Ai70Dh7a82TRPN7OHynxmx 8NV8+fj4d0v9v8AY8aCowpWQtZ3iscSDk14a1LYoI8DLjl0VQo+5Bo= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :subject:to:message-id:date:mime-version:content-type; s= default; bh=QvjIXWCDdzAwFFytI5DEWhW3Fyc=; b=m28a5sq6yxuIJCESNbJW Cft49Qib3xEgyBQxZfMSI05OERAMSl1YtHH9V+aDsKUCCQ/Fapb8JbDGnmQodvQw 4BFzsGTUvJDDF+2x5+ZYwW63lExEQQtGweMsZG78VGVlnKfJTPkR861wZaLWxo/K yqtCNO4gK1OOP+0McKsElP4= Received: (qmail 88220 invoked by alias); 5 Sep 2018 20:05:52 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 88211 invoked by uid 89); 5 Sep 2018 20:05:51 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-25.0 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.2 spammy=partitioning, H*M:1b9b, H*MI:1b9b, pass_data X-HELO: relay1.mentorg.com Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 05 Sep 2018 20:05:50 +0000 Received: from svr-orw-mbx-04.mgc.mentorg.com ([147.34.90.204]) by relay1.mentorg.com with esmtps (TLSv1.2:ECDHE-RSA-AES256-SHA384:256) id 1fxe3g-0000WT-ES from Cesar_Philippidis@mentor.com ; Wed, 05 Sep 2018 13:05:48 -0700 Received: from [127.0.0.1] (147.34.91.1) by SVR-ORW-MBX-04.mgc.mentorg.com (147.34.90.204) with Microsoft SMTP Server (TLS) id 15.0.1320.4; Wed, 5 Sep 2018 13:05:45 -0700 From: Cesar Philippidis Subject: [patch][OpenACC] Add target hook TARGET_GOACC_ADJUST_PARALLELISM To: "gcc-patches@gcc.gnu.org" , Jakub Jelinek Message-ID: <7dd10c01-ff00-1b9b-77dd-ecd79fce55e3@mentor.com> Date: Wed, 5 Sep 2018 13:05:41 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 At present, GCC fixes the vector length on all targets. However, that is an artificial restriction. This patch introduces a new TARGET_GOACC_ADJUST_PARALLELISM hook that enables the runtime to correct the default number of acc workers and vectors. Extra care need to be done to ensure that large vectors fit inside workers. The target hook itself doesn't do anything for the host, but the nvptx BE will make use of it. Is this patch OK for trunk? I regtested and bootstrapped for x86_64 with nvptx offloading. Thanks, Cesar [openacc] Add target hook TARGET_GOACC_ADJUST_PARALLELISM gcc/ * doc/tm.texi.in: Add placeholder for TARGET_GOACC_ADJUST_PARALLELISM. * doc/tm.texi: Regenerate. * omp-offload.c (oacc_loop_fixed_partitions): Use the adjust_parallelism hook to modify this_mask. (oacc_loop_auto_partitions): Use the adjust_parallelism hook to modify this_mask and loop->mask. (default_goacc_adjust_parallelism): New function. * target.def (adjust_parallelism): New hook. * targhooks.h (default_goacc_adjust_parallelism): Declare. diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index a40f45ade07..365a7bbec90 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -6029,6 +6029,12 @@ This hook should return the maximum size of a particular dimension, or zero if unbounded. @end deftypefn +@deftypefn {Target Hook} unsigned TARGET_GOACC_ADJUST_PARALLELISM (unsigned @var{this_mask}, unsigned @var{outer_mask}) +This hook allows the accelerator compiler to remove any unused +parallelism exposed in the current loop @var{THIS_MASK}, and the +enclosing loop @var{OUTER_MASK}. It returns an adjusted mask. +@end deftypefn + @deftypefn {Target Hook} bool TARGET_GOACC_FORK_JOIN (gcall *@var{call}, const int *@var{dims}, bool @var{is_fork}) This hook can be used to convert IFN_GOACC_FORK and IFN_GOACC_JOIN function calls to target-specific gimple, or indicate whether they diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 39a214e9b2c..9edd2e7ecaf 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -4145,6 +4145,8 @@ address; but often a machine-dependent strategy can generate better code. @hook TARGET_GOACC_DIM_LIMIT +@hook TARGET_GOACC_ADJUST_PARALLELISM + @hook TARGET_GOACC_FORK_JOIN @hook TARGET_GOACC_REDUCTION diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c index 0abf0283c9e..1659febd2b1 100644 --- a/gcc/omp-offload.c +++ b/gcc/omp-offload.c @@ -1218,6 +1218,13 @@ oacc_loop_fixed_partitions (oacc_loop *loop, unsigned outer_mask) } } + /* Ideally, we should be coalescing parallelism here if the + hardware supports it. E.g. Instead of partitioning a loop + across worker and vector axes, sometimes the hardware can + execute those loops together without resorting to placing + extra thread barriers. */ + this_mask = targetm.goacc.adjust_parallelism (this_mask, outer_mask); + mask_all |= this_mask; if (loop->flags & OLF_TILE) @@ -1302,6 +1309,7 @@ oacc_loop_auto_partitions (oacc_loop *loop, unsigned outer_mask, this_mask ^= loop->e_mask; } + this_mask = targetm.goacc.adjust_parallelism (this_mask, outer_mask); loop->mask |= this_mask; } @@ -1350,6 +1358,8 @@ oacc_loop_auto_partitions (oacc_loop *loop, unsigned outer_mask, } loop->mask |= this_mask; + loop->mask = targetm.goacc.adjust_parallelism (loop->mask, outer_mask); + if (!loop->mask && noisy) warning_at (loop->loc, 0, tiling @@ -1684,6 +1694,15 @@ default_goacc_dim_limit (int ARG_UNUSED (axis)) #endif } +/* Default adjustment of loop parallelism is not required. */ + +unsigned +default_goacc_adjust_parallelism (unsigned this_mask, + unsigned ARG_UNUSED (outer_mask)) +{ + return this_mask; +} + namespace { const pass_data pass_data_oacc_device_lower = diff --git a/gcc/target.def b/gcc/target.def index c570f3825a5..401d681fc42 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -1678,6 +1678,14 @@ or zero if unbounded.", int, (int axis), default_goacc_dim_limit) +DEFHOOK +(adjust_parallelism, +"This hook allows the accelerator compiler to remove any unused\n\ +parallelism exposed in the current loop @var{THIS_MASK}, and the\n\ +enclosing loop @var{OUTER_MASK}. It returns an adjusted mask.", +unsigned, (unsigned this_mask, unsigned outer_mask), +default_goacc_adjust_parallelism) + DEFHOOK (fork_join, "This hook can be used to convert IFN_GOACC_FORK and IFN_GOACC_JOIN\n\ diff --git a/gcc/targhooks.h b/gcc/targhooks.h index f92ca5ca997..38e024b13de 100644 --- a/gcc/targhooks.h +++ b/gcc/targhooks.h @@ -125,6 +125,7 @@ extern bool default_goacc_validate_dims (tree, int [], int); extern int default_goacc_dim_limit (int); extern bool default_goacc_fork_join (gcall *, const int [], bool); extern void default_goacc_reduction (gcall *); +extern unsigned default_goacc_adjust_parallelism (unsigned, unsigned); /* These are here, and not in hooks.[ch], because not all users of hooks.h include tm.h, and thus we don't have CUMULATIVE_ARGS. */ -- 2.17.1