From patchwork Mon Apr 30 13:33:23 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom de Vries X-Patchwork-Id: 906696 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-476988-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=mentor.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="hZUxQNLW"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 40ZQWY32mdz9s1w for ; Mon, 30 Apr 2018 23:34:47 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:message-id:date:mime-version:content-type; q=dns; s=default; b=N0wDsc8MFMyHRFslOrENnm+ziruB1s+OPZ1KY1u3E6ujCUcbvq XXvW3OxBytHMOmhXoEGuC1rSNeUaR3yzSgrxDgCzidXoAlzPB09LnNr6b3AsR0IR gzBcVlPSs4fjunmXxPcgDZlc21xAJp6vvVIxUXHrbSr3MT5zH+41Nue8Q= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:message-id:date:mime-version:content-type; s= default; bh=eKwZLPTk/QGy5y+JkwlIbCrY0uY=; b=hZUxQNLWohma7vianI8s 7W1P2yvnqaHLEtRcY6qLnnjzDTuPnISRlD64Fiq/Lqfj7z4L+iH6NT1oaZ4fk1E2 pJ6VTz8H1C+NKzNIjOc29n7oOcazBpwHbHpWLV3WymkcdTft3C3u/H9YqpjZD3Wz JihgphN7HRlAJe0k5bk5muo= Received: (qmail 69645 invoked by alias); 30 Apr 2018 13:34:40 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 69632 invoked by uid 89); 30 Apr 2018 13:34:39 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-24.9 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_PASS, URIBL_RED autolearn=ham version=3.3.2 spammy=workers, Allocation, furthermore, atm X-HELO: relay1.mentorg.com Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 30 Apr 2018 13:34:34 +0000 Received: from nat-ies.mentorg.com ([192.94.31.2] helo=SVR-IES-MBX-04.mgc.mentorg.com) by relay1.mentorg.com with esmtps (TLSv1.2:ECDHE-RSA-AES256-SHA384:256) id 1fD8wq-0007Xg-7I from Tom_deVries@mentor.com for gcc-patches@gcc.gnu.org; Mon, 30 Apr 2018 06:34:32 -0700 Received: from [172.30.73.38] (137.202.0.87) by SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) with Microsoft SMTP Server (TLS) id 15.0.1320.4; Mon, 30 Apr 2018 14:33:24 +0100 To: GCC Patches CC: Thomas Schwinge From: Tom de Vries Subject: [og7, libgomp, nvptx, committed] Fix too-many-resources fatal error condition and message Message-ID: Date: Mon, 30 Apr 2018 15:33:23 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 X-ClientProxiedBy: svr-ies-mbx-01.mgc.mentorg.com (139.181.222.1) To SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) Hi, atm parallel-dims.c fails on Titan-V, with a cuda launch failure: ... libgomp: cuLaunchKernel error: too many resources requested for launch ... We've got a check in the libgomp nvptx plugin to prevent the cuda launch failure and give a more informative error message: ... /* Check if the accelerator has sufficient hardware resources to launch the offloaded kernel. */ if (dims[GOMP_DIM_WORKER] > 1) { if (reg_granularity > 0 && dims[GOMP_DIM_WORKER] > threads_per_block) GOMP_PLUGIN_fatal ("The Nvidia accelerator has insufficient resources " "to launch '%s'; recompile the program with " "'num_workers = %d' on that offloaded region or " "'-fopenacc-dim=-:%d'.\n", targ_fn->launch->fn, threads_per_block, threads_per_block); } ... The message doesn't trigger, because reg_granularity == -1. This value comes from dev->register_allocation_granularity which defaults to -1 because libgomp does not have a hardcoded constant for sm_70. The hardcoded constants that are present match 'Warp Allocation Granularity' in the GPU Data table in CUDA_Occupancy_calculator.xls, but AFAICT there's no column published yet for sm_70. Furthermore, the comparison to threads_per_block is not correct. What we want here is the maximum amount of threads per block, while the threads_per_block variable contains an approximation of that, and the exact amount required is already available from the CUDA runtime and stored at targ_fn->max_threads_per_block. Then, the comparison to dims[GOMP_DIM_WORKER] is incorrect. It used to be correct before "[nvptx] Handle large vectors in libgomp" when we used to do "threads_per_block /= warp_size", but now we need to compare against dims[GOMP_DIM_WORKER] * dims[GOMP_DIM_VECTOR]. Finally, the message has not been updated to reflect that vector length can be larger than 32. The patch addresses these issues. Committed to og7. Thanks, - Tom [libgomp, nvptx] Fix too-many-resources fatal error condition and message 2018-04-30 Tom de Vries * plugin/plugin-nvptx.c (nvptx_exec): Fix insufficient-resources-to-launch fatal error condition and message. --- libgomp/plugin/plugin-nvptx.c | 19 +++++++++---------- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c index 9b4768f..3c00555 100644 --- a/libgomp/plugin/plugin-nvptx.c +++ b/libgomp/plugin/plugin-nvptx.c @@ -834,16 +834,15 @@ nvptx_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs, /* Check if the accelerator has sufficient hardware resources to launch the offloaded kernel. */ - if (dims[GOMP_DIM_WORKER] > 1) - { - if (reg_granularity > 0 && dims[GOMP_DIM_WORKER] > threads_per_block) - GOMP_PLUGIN_fatal ("The Nvidia accelerator has insufficient resources " - "to launch '%s'; recompile the program with " - "'num_workers = %d' on that offloaded region or " - "'-fopenacc-dim=-:%d'.\n", - targ_fn->launch->fn, threads_per_block, - threads_per_block); - } + if (dims[GOMP_DIM_WORKER] * dims[GOMP_DIM_VECTOR] + > targ_fn->max_threads_per_block) + GOMP_PLUGIN_fatal ("The Nvidia accelerator has insufficient resources to" + " launch '%s' with num_workers = %d and vector_length =" + " %d; recompile the program with 'num_workers = x and" + " vector_length = y' on that offloaded region or " + "'-fopenacc-dim=-:x:y' where x * y <= %d.\n", + targ_fn->launch->fn, dims[GOMP_DIM_WORKER], + dims[GOMP_DIM_VECTOR], targ_fn->max_threads_per_block); GOMP_PLUGIN_debug (0, " %s: kernel %s: launch" " gangs=%u, workers=%u, vectors=%u\n",