From patchwork Fri Aug 11 19:38:14 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cesar Philippidis X-Patchwork-Id: 800730 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-460260-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="o2i6+gTS"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3xTb0C0CtSz9s7M for ; Sat, 12 Aug 2017 05:38:32 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :subject:to:message-id:date:mime-version:content-type; q=dns; s= default; b=Jj5LnBDsuKx49jeKbJc69Xsoh8XlL/WSyEpUCp7p11pfvgM9m42DI JCS3tu15vOSw2f2wKVRTNJTTMF741hUK5EFMyJHwFOfbSEV+V1csS3aQB6NLCVNF 2QQrquW+4BWEj8UDeuUld5unD4gW93jNgIfwym68vPS4eQCe7smC7M= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :subject:to:message-id:date:mime-version:content-type; s= default; bh=RNODo88W4HxgWHrFC+Ecku0/wCQ=; b=o2i6+gTSKKCkwwgBQSc9 ipD+XcCMvyBZvaiYxEKno7ptBxXuFXMKkDFgQhBGIDEKXtLCwPnBR35r0UmRsEJM IsRBoxm2Jg3xEGPd2kmCs+LlfzSSNVdKfxQK2IQuajDIVYE11iYxUyURtb9nSxVk Cwv8O4hA+gkpWQyQpGk+I/8= Received: (qmail 64573 invoked by alias); 11 Aug 2017 19:38:22 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 64555 invoked by uid 89); 11 Aug 2017 19:38:21 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-24.8 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_PASS, URIBL_RED autolearn=ham version=3.3.2 spammy=utilized X-HELO: relay1.mentorg.com Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 11 Aug 2017 19:38:19 +0000 Received: from svr-orw-mbx-04.mgc.mentorg.com ([147.34.90.204]) by relay1.mentorg.com with esmtp id 1dgFlB-0004Or-2L from Cesar_Philippidis@mentor.com for gcc-patches@gcc.gnu.org; Fri, 11 Aug 2017 12:38:17 -0700 Received: from [127.0.0.1] (147.34.91.1) by SVR-ORW-MBX-04.mgc.mentorg.com (147.34.90.204) with Microsoft SMTP Server (TLS) id 15.0.1263.5; Fri, 11 Aug 2017 12:38:14 -0700 From: Cesar Philippidis Subject: [og7] Adjust k80 resources To: "gcc-patches@gcc.gnu.org" Message-ID: <127e6708-d713-a34d-753c-d0ce978a8172@codesourcery.com> Date: Fri, 11 Aug 2017 12:38:14 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 X-ClientProxiedBy: svr-orw-mbx-02.mgc.mentorg.com (147.34.90.202) To SVR-ORW-MBX-04.mgc.mentorg.com (147.34.90.204) I've pushed this patch to openacc-gcc-7-branch to teach the libgomp nvptx plugin how to cope with the hardware resources on K80 boards. K80 boards have two physical GPUs on a single board. Consequently, the CUDA driver reports that 2x the amount of registers and shared memory are available on those GPUs. But that's not true if only a single GPU is being utilized. Consequently, this prevented the runtime from informing the user that that K80 does not have sufficient hardware resources to execute certain offloaded kernels. Unfortunately, I don't have a test case which reproduce this failure, but it does show up in various OpenACC tests such as cloverleaf. I'll try to create a reduced test case that uses a lot of hardware registers later. Cesar 2017-08-11 Cesar Philippidis libgomp/ * plugin/cuda/cuda.h (CUdevice_attribute): Add CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR, CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR. * plugin/plugin-nvptx.c (struct ptx_device): Add compute_capability_major, compute_capability_minor members. (nvptx_open_device): Probe driver for those values. Adjust regs_per_sm and max_shared_memory_per_multiprocessor for K80 hardware. diff --git a/libgomp/plugin/cuda/cuda.h b/libgomp/plugin/cuda/cuda.h index 25d5d1913b0..94a693cbdef 100644 --- a/libgomp/plugin/cuda/cuda.h +++ b/libgomp/plugin/cuda/cuda.h @@ -69,6 +69,8 @@ typedef enum { CU_DEVICE_ATTRIBUTE_CONCURRENT_KERNELS = 31, CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_MULTIPROCESSOR = 39, CU_DEVICE_ATTRIBUTE_ASYNC_ENGINE_COUNT = 40, + CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR = 75, + CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR = 76, CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR = 82 } CUdevice_attribute; diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c index 37e1f6efbe1..10f000ab3c1 100644 --- a/libgomp/plugin/plugin-nvptx.c +++ b/libgomp/plugin/plugin-nvptx.c @@ -285,7 +285,9 @@ struct ptx_device bool map; bool concur; bool mkern; - int mode; + int mode; + int compute_capability_major; + int compute_capability_minor; int clock_khz; int num_sms; int regs_per_block; @@ -448,6 +450,14 @@ nvptx_open_device (int n) ptx_dev->mode = pi; CUDA_CALL_ERET (NULL, cuDeviceGetAttribute, + &pi, CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR, dev); + ptx_dev->compute_capability_major = pi; + + CUDA_CALL_ERET (NULL, cuDeviceGetAttribute, + &pi, CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR, dev); + ptx_dev->compute_capability_minor = pi; + + CUDA_CALL_ERET (NULL, cuDeviceGetAttribute, &pi, CU_DEVICE_ATTRIBUTE_INTEGRATED, dev); ptx_dev->mkern = pi; @@ -512,20 +522,37 @@ nvptx_open_device (int n) GOMP_PLUGIN_debug (0, "Nvidia device %d:\n\tGPU_OVERLAP = %d\n" "\tCAN_MAP_HOST_MEMORY = %d\n\tCONCURRENT_KERNELS = %d\n" - "\tCOMPUTE_MODE = %d\n\tINTEGRATED = %d\n" + "\tCOMPUTE_MODE = %d\n" + "\tCU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR = %d\n" + "\tCU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR = %d\n" + "\tINTEGRATED = %d\n" "\tMAX_THREADS_PER_BLOCK = %d\n\tWARP_SIZE = %d\n" "\tMULTIPROCESSOR_COUNT = %d\n" "\tMAX_THREADS_PER_MULTIPROCESSOR = %d\n" "\tMAX_REGISTERS_PER_MULTIPROCESSOR = %d\n" "\tMAX_SHARED_MEMORY_PER_MULTIPROCESSOR = %d\n", ptx_dev->ord, ptx_dev->overlap, ptx_dev->map, - ptx_dev->concur, ptx_dev->mode, ptx_dev->mkern, - ptx_dev->max_threads_per_block, ptx_dev->warp_size, - ptx_dev->num_sms, + ptx_dev->concur, ptx_dev->mode, + ptx_dev->compute_capability_major, + ptx_dev->compute_capability_minor, + ptx_dev->mkern, ptx_dev->max_threads_per_block, + ptx_dev->warp_size, ptx_dev->num_sms, ptx_dev->max_threads_per_multiprocessor, ptx_dev->regs_per_sm, ptx_dev->max_shared_memory_per_multiprocessor); + /* K80 (SM_37) boards contain two physical GPUs. Consequntly they + report 2x larger values for MAX_REGISTERS_PER_MULTIPROCESSOR and + MAX_SHARED_MEMORY_PER_MULTIPROCESSOR. Those values need to be + adjusted on order to allow the nvptx_exec to select an + appropriate num_workers. */ + if (ptx_dev->compute_capability_major == 3 + && ptx_dev->compute_capability_minor == 7) + { + ptx_dev->regs_per_sm /= 2; + ptx_dev->max_shared_memory_per_multiprocessor /= 2; + } + return ptx_dev; }