From patchwork Tue Sep 10 13:39:28 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Stubbs X-Patchwork-Id: 1160334 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-508764-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="DFhFb47o"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 46SR3d49WBz9sP3 for ; Tue, 10 Sep 2019 23:39:57 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :subject:to:message-id:date:mime-version:content-type; q=dns; s= default; b=k7G1a4AGTt1Le5ldxo9TjQTPrE3bYfW01esmKQmFeWDkZ4y2uu+d6 oMPgoBEZFZq1nqd8P56Hu5lwN5GVmpEIE4GtantHwIRGVMtorEopOI9WWoS1smcx ibUdTD6GuldrerVpvfzYAh25FCQR5F/hTHi2vtB1pR4DYcAk5Xx17U= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :subject:to:message-id:date:mime-version:content-type; s= default; bh=SbdNadLggLAd+jvrrId2GSyUXqk=; b=DFhFb47oxO8BEWgRgr2y NQgXCpa80731PAODMf0IFp6SHe9F2/9ZAQ8WKMbd0WrSPkpqjcZtd3rJ/jog6Szz IzEcKgCKsxp6gAlx46ndY42tMXxYdgdj7ZZjKT16b5ZlJRvadJgDHUDcdXMGKFbW ApDtPtquhB1txvuqa6IOtHI= Received: (qmail 72649 invoked by alias); 10 Sep 2019 13:39:50 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 72532 invoked by uid 89); 10 Sep 2019 13:39:50 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-18.5 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.1 spammy=Additional, 1.1, usual X-HELO: esa1.mentor.iphmx.com Received: from esa1.mentor.iphmx.com (HELO esa1.mentor.iphmx.com) (68.232.129.153) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 10 Sep 2019 13:39:48 +0000 IronPort-SDR: XFeH46VKIPl2/AraQempSxxvm/03626kpfNI4DR8qV/W4P5cKGkIz4ImXokVb9bQFUvmZCVOLc +2SVfbzXMgP1fuj58+cYca9eBb9Fgpq9Zk7NivroaT3Z5rvGBH6WIo0z5YB/U6gzciO0HtTTS7 5YWgcPREMkkkiIZVCgjtE/6Gnqd4Fn7eNzfAwp4PW8OVIhpKyximE0qhFujC5wo52rWxcNmyp1 2nyiAASYwTPGrUJPE6E62YqR6G+f59qQCCbfQ81xTKU334s9iEBO65FXtB39NOMs9ywjdNCW4v NIE= Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa1.mentor.iphmx.com with ESMTP; 10 Sep 2019 05:39:39 -0800 IronPort-SDR: e6jpfVfnYkhhJNdqLVeqnCu9vQJMH41ztfV7+Dhu98uMaBOf9mn+SkwVRbHefLwBQzCbMju+cI MDFb6q1GX/ZXCA0epQvR11Qe/rR7RNDukq/lRAmHhjSpigaofxiRqHxNj58uPlnohdH9XGorcg XPDOQr56tK5BJ+Rmj3AGDklnBsMRw4MDczoHaY7EsExguRjzWtPdF/zfuDT0VUbAKomCvkWwNk 6Rg5arP5aMmfq+veSJfl+/RpgaDCXaQXiRLS5gYVPWt4QgSQkkXPl6If51oy1uAzFJ3TvomTu0 4sA= From: Andrew Stubbs Subject: [OG9, amdgcn, committed] Detect the actual number of hardware CUs To: "gcc-patches@gcc.gnu.org" Message-ID: <342135f4-2278-9bf6-1200-a22c37d7a6d5@codesourcery.com> Date: Tue, 10 Sep 2019 14:39:28 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 This patch improves out-of-the-box benchmark results by ensuring that we don't launch 64 gangs on a device that only has 60 compute units, such as consumer Vega 20. It's not suitable for upstream mainline yet because we need to update hsa.h with definitions from Radeon Open Compute Runtime (ROCr), but there are license issues with that. We could extract them from the documentation, but this is still on my TODO list. Andrew Detect number of GPU compute units. 2019-09-10 Andrew Stubbs libgomp/ * plugin/plugin-gcn.c (HSA_AMD_AGENT_INFO_COMPUTE_UNIT_COUNT): Define. (dump_hsa_agent_info): Dump compute unit count. (get_cu_count): New function. (parse_target_attributes): Use get_cu_count for default gdims. (gcn_exec): Likewise. diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c index 6c00c81b588..9d03e4f9f5b 100644 --- a/libgomp/plugin/plugin-gcn.c +++ b/libgomp/plugin/plugin-gcn.c @@ -44,6 +44,11 @@ #include "oacc-int.h" #include +/* Additional definitions not in HSA 1.1. + FIXME: this needs to be updated in hsa.h for upstream, but the only source + right now is the ROCr source which may cause license issues. */ +#define HSA_AMD_AGENT_INFO_COMPUTE_UNIT_COUNT 0xA002 + /* These probably won't be in elf.h for a while. */ #define R_AMDGPU_NONE 0 #define R_AMDGPU_ABS32_LO 1 /* (S + A) & 0xFFFFFFFF */ @@ -845,6 +850,14 @@ dump_hsa_agent_info (hsa_agent_t agent, void *data __attribute__((unused))) else HSA_DEBUG ("HSA_AGENT_INFO_DEVICE: FAILED\n"); + uint32_t cu_count; + status = hsa_fns.hsa_agent_get_info_fn + (agent, HSA_AMD_AGENT_INFO_COMPUTE_UNIT_COUNT, &cu_count); + if (status == HSA_STATUS_SUCCESS) + HSA_DEBUG ("HSA_AMD_AGENT_INFO_COMPUTE_UNIT_COUNT: %u\n", cu_count); + else + HSA_DEBUG ("HSA_AMD_AGENT_INFO_COMPUTE_UNIT_COUNT: FAILED\n"); + uint32_t size; status = hsa_fns.hsa_agent_get_info_fn (agent, HSA_AGENT_INFO_WAVEFRONT_SIZE, &size); @@ -2449,6 +2462,18 @@ init_kernel (struct kernel_info *kernel) "mutex"); } +static int +get_cu_count (struct agent_info *agent) +{ + uint32_t cu_count; + hsa_status_t status = hsa_fns.hsa_agent_get_info_fn + (agent->id, HSA_AMD_AGENT_INFO_COMPUTE_UNIT_COUNT, &cu_count); + if (status == HSA_STATUS_SUCCESS) + return cu_count; + else + return 64; /* The usual number for older devices. */ +} + /* Calculate the maximum grid size for OMP threads / OACC workers. This depends on the kernel's resource usage levels. */ @@ -2527,8 +2552,8 @@ parse_target_attributes (void **input, } def->ndim = 3; - /* Fiji has 64 CUs. */ - def->gdims[0] = (gcn_teams > 0) ? gcn_teams : 64; + /* Fiji has 64 CUs, but Vega20 has 60. */ + def->gdims[0] = (gcn_teams > 0) ? gcn_teams : get_cu_count (agent); /* Each thread is 64 work items wide. */ def->gdims[1] = 64; /* A work group can have 16 wavefronts. */ @@ -3308,7 +3333,7 @@ gcn_exec (struct kernel_info *kernel, size_t mapnum, void **hostaddrs, problem size, so let's do a reasonable number of single-worker gangs. 64 gangs matches a typical Fiji device. */ - if (dims[0] == 0) dims[0] = 64; /* Gangs. */ + if (dims[0] == 0) dims[0] = get_cu_count (kernel->agent); /* Gangs. */ if (dims[1] == 0) dims[1] = 16; /* Workers. */ /* The incoming dimensions are expressed in terms of gangs, workers, and