From patchwork Tue Feb 1 07:21:20 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom de Vries X-Patchwork-Id: 1587103 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=mPl6dPAz; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4JnxGp0BNRz9s9c for ; Tue, 1 Feb 2022 18:22:58 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id C0CD03857C4F for ; Tue, 1 Feb 2022 07:22:55 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C0CD03857C4F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1643700175; bh=i0q9msIW9xdegb6wB8KC0oqcyPTdpUrECNcK62q41CM=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=mPl6dPAzwu5CN8labwqATfkLFgZacbxRux1hEe9ivJj+unJ17meCSV3Xbe76vGG0W 4el2C1LQTlkkPlrcJq3oxo4NVBndPa9+IMpcsOI+Y7lAggEc+8RNQcVzAtkFEQ8KoZ 9t6iP7R+j1tUzZnruaO3P7wTng1s7tw2WXjI4//s= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by sourceware.org (Postfix) with ESMTPS id 80D123858D20 for ; Tue, 1 Feb 2022 07:21:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 80D123858D20 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id ADE961F380; Tue, 1 Feb 2022 07:21:21 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 8989713CE4; Tue, 1 Feb 2022 07:21:21 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id XmBaIHHf+GHgMgAAMHmgww (envelope-from ); Tue, 01 Feb 2022 07:21:21 +0000 Date: Tue, 1 Feb 2022 08:21:20 +0100 To: gcc-patches@gcc.gnu.org Subject: [committed][libgomp, testsuite] Fix insufficient resources in test-cases Message-ID: <20220201072118.GA6642@delia.home> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.10.1 (2018-07-13) X-Spam-Status: No, score=-12.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tom de Vries via Gcc-patches From: Tom de Vries Reply-To: Tom de Vries Cc: Thomas Schwinge Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Hi, When running libgomp test-case broadcast-many.c on an nvptx accelerator (T400, driver version 470.86), I run into: ... libgomp: The Nvidia accelerator has insufficient resources to launch \ 'main$_omp_fn$0' with num_workers = 32 and vector_length = 32; \ recompile the program with 'num_workers = x and vector_length = y' on \ that offloaded region or '-fopenacc-dim=:x:y' where x * y <= 896. FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/broadcast-many.c \ -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none \ -O0 execution test ... The error does not occur when using GOMP_NVPTX_JIT=-O0. Fix this by using 896 / 32 == 28 workers for ACC_DEVICE_TYPE_nvidia. Likewise for some other test-cases. Tested libgomp on x86_64 with nvptx accelerator. Committed to trunk. Thanks, - Tom [libgomp, testsuite] Fix insufficient resources in test-cases libgomp/ChangeLog: 2022-01-27 Tom de Vries * testsuite/libgomp.oacc-c-c++-common/broadcast-many.c: Reduce num_workers for nvidia accelerator to fix libgomp error 'insufficient resources'. * testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c: Same. * testsuite/libgomp.oacc-c-c++-common/reduction-7.c: Same. --- libgomp/testsuite/libgomp.oacc-c-c++-common/broadcast-many.c | 9 ++++++++- .../libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c | 10 +++++++++- libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-7.c | 9 ++++++++- 3 files changed, 25 insertions(+), 3 deletions(-) diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/broadcast-many.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/broadcast-many.c index 37839edfb09..08e026960e6 100644 --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/broadcast-many.c +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/broadcast-many.c @@ -5,6 +5,13 @@ #include #include +#if ACC_DEVICE_TYPE_nvidia +/* To avoid 'libgomp: The Nvidia accelerator has insufficient resources'. */ +#define NUM_WORKERS 28 +#else +#define NUM_WORKERS 32 +#endif + #define LOCAL(n) double n = input; #define LOCALS(n) LOCAL(n##1) LOCAL(n##2) LOCAL(n##3) LOCAL(n##4) \ LOCAL(n##5) LOCAL(n##6) LOCAL(n##7) LOCAL(n##8) @@ -23,7 +30,7 @@ int main (void) int ret; int input = 1; - #pragma acc parallel num_gangs(1) num_workers(32) copyout(ret) + #pragma acc parallel num_gangs(1) num_workers(NUM_WORKERS) copyout(ret) { int w = 0; LOCALS2(h); diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c index c3cc12fa953..4c66dc7bfea 100644 --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c @@ -1,5 +1,12 @@ #include +#if ACC_DEVICE_TYPE_nvidia +/* To avoid 'libgomp: The Nvidia accelerator has insufficient resources'. */ +#define NUM_WORKERS 24 +#else +#define NUM_WORKERS 32 +#endif + /* Test of reduction on both parallel and loop directives (workers and vectors together in gang-partitioned mode, float type, multiple reductions). */ @@ -13,7 +20,8 @@ main (int argc, char *argv[]) for (i = 0; i < 32768; i++) arr[i] = i % (32768 / 64); - #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \ + #pragma acc parallel \ + num_gangs(32) num_workers(NUM_WORKERS) vector_length(32) \ reduction(+:res) reduction(max:mres) copy(res, mres) { #pragma acc loop gang /* { dg-warning "nested loop in reduction needs reduction clause for 'm\?res'" "TODO" } */ diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-7.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-7.c index c2fb922a7f1..b4fe2300581 100644 --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-7.c +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-7.c @@ -181,6 +181,12 @@ void gwv_np_3() assert (res == hres); } +#if ACC_DEVICE_TYPE_nvidia +/* To avoid 'libgomp: The Nvidia accelerator has insufficient resources'. */ +#define NUM_WORKERS 28 +#else +#define NUM_WORKERS 32 +#endif /* Test of reduction on loop directive (gangs, workers and vectors, multiple non-private reduction variables, float type). */ @@ -194,7 +200,7 @@ void gwv_np_4() for (i = 0; i < 32768; i++) arr[i] = i % (32768 / 64); - #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) + #pragma acc parallel num_gangs(32) num_workers(NUM_WORKERS) vector_length(32) { #pragma acc loop gang reduction(+:res) reduction(max:mres) for (j = 0; j < 32; j++) @@ -235,6 +241,7 @@ void gwv_np_4() assert (mres == hmres); } +#undef NUM_WORKERS /* Test of reduction on loop directive (vectors, private reduction variable). */