From patchwork Sat May 4 21:21:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sandra Loosemore X-Patchwork-Id: 1931421 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=baylibre-com.20230601.gappssmtp.com header.i=@baylibre-com.20230601.gappssmtp.com header.a=rsa-sha256 header.s=20230601 header.b=fqPjiiyQ; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4VX0yN6yVZz1ybC for ; Sun, 5 May 2024 07:23:08 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 24E8138449C7 for ; Sat, 4 May 2024 21:23:07 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-io1-xd34.google.com (mail-io1-xd34.google.com [IPv6:2607:f8b0:4864:20::d34]) by sourceware.org (Postfix) with ESMTPS id 695273849ACE for ; Sat, 4 May 2024 21:22:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 695273849ACE Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=baylibre.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=baylibre.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 695273849ACE Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::d34 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1714857733; cv=none; b=wSmpvXSVcsQuzdL8AeYlFd/r+BJBlvd11ZwzwDR5RnywvDTZC2uq89GJJUkQCfM08ki5Zuh8fKGTqAB2WZ69hCOmyBSJh+HSjJCmPC3jNJ1PTPAoRPJN1wEYL/q4pOBYVD5mfPMA11u9P0zFf8RlelynMFGraoHAWlgLrj5JyHU= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1714857733; c=relaxed/simple; bh=ndsnfJQqXGERZCCt+rY2YN95ayqiWgtsNW+eHnJHa5k=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=JxbN1oMH9pH4BcxP+v3eJu50iV7B3osdn0QuI8a2P08GtTbT7wfJhfQbKXSg0uyjj9SE9imCtqo2wsIlBhXBuQQsvBFOzBWhrADojI1so16b78J/OCH5TpBNH+dFchfGfKDOB7Ps8Qpwmis5XQ1+6C7z6ZidNt89LeZs0DCnKMY= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-io1-xd34.google.com with SMTP id ca18e2360f4ac-7da3ec3e044so38859839f.2 for ; Sat, 04 May 2024 14:22:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=baylibre-com.20230601.gappssmtp.com; s=20230601; t=1714857727; x=1715462527; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=yWsgxQ5ekIpybCtEgKPYt/gh2YFEIAwsqKnJtLolOkk=; b=fqPjiiyQnSEJL+wt6WJIkDPge+Osq43Udsp8ESJvhEdgkaEk9ySi2+N5r6VuucZbU5 ThKoDpHu9TmrBMRRM3C2i//LMqeWE3HRz4Pa6z5AFZsp6iZEjaKryxGSzVGm2BXQqboQ 1lo2H61dVdRskdKy6OL66+uRLYMsUeEo6yPOU0Om0bqJbPmeLoq1+PRupjOu9cBYMro/ pyabcXA9wpetRdUK4WHv/BUz0j+Mq/qJ9LU8S/LwCybanlK7xHBI6Pqm+rXleq/MbY/r 152rkaUWvl8AoZa3INAQMP7jUqdWuvsQe3ZbXK4eBs3//ixwApUQ5Ct+95lxxeJiEW5J qNvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714857727; x=1715462527; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=yWsgxQ5ekIpybCtEgKPYt/gh2YFEIAwsqKnJtLolOkk=; b=NaeuRkq+BDaSg7uiRc9Vjdd42tMCepo1vBhBTjO8/FX0UWF4g8a8qNuLdMTmDsWp7I wyE5ME00BsVccTWlI5YL0lHRKPC1zq5iEloEXmnT6Qrc/w1BacobJGFVOF5HvIZNHOoN +XrwTmBHVjmEmBUzdIY589SbV4PLJ+5UW7DFtacQBCzabGjMrZ4haka5fnETQlrtHvp4 AEYdod3dccBtCTfs7CnX/Bn6Pj5L9VaP8PzXE16U4SMh55J3O64Fg9cvh/s/l4VJ+iEp DBA8J9Rj7udaPVM3UnwY8uqruDBzqmNdHSOV1gKn0sBW7FmgLWCyT8jU+p68cv9Uv6GK cGxQ== X-Gm-Message-State: AOJu0Yx5FhR/R8/IEiT8dZOheky8r09pVNz5nKhbX9nY7md9i3nQHAsK p6toyH6xB+VKilKSGHGnJ1rVPPYdIWYdMZMgVWNIxSFcd5z3FuSLtpnbIlsPMettAdL5HO8klCQ 0 X-Google-Smtp-Source: AGHT+IHzLBYU51AHq7PY8S3iZzjGn39mGX33Wu1r42udTjdBX45AJzA3H3Q498zrzJuBX70DBREAwA== X-Received: by 2002:a5d:9915:0:b0:7de:e497:ac7b with SMTP id x21-20020a5d9915000000b007dee497ac7bmr7316382iol.0.1714857725963; Sat, 04 May 2024 14:22:05 -0700 (PDT) Received: from pondscum.hsd1.co.comcast.net ([2601:281:d901:5620:3e29:4728:ec99:5098]) by smtp.gmail.com with ESMTPSA id ez3-20020a056638614300b004877be21febsm1559468jab.62.2024.05.04.14.22.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 04 May 2024 14:22:05 -0700 (PDT) From: Sandra Loosemore To: gcc-patches@gcc.gnu.org Cc: jakub@redhat.com, tburnus@baylibre.com Subject: [PATCH 03/12] libgomp: runtime support for target_device selector Date: Sat, 4 May 2024 15:21:43 -0600 Message-Id: <20240504212153.3561429-4-sloosemore@baylibre.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240504212153.3561429-1-sloosemore@baylibre.com> References: <20240504212153.3561429-1-sloosemore@baylibre.com> MIME-Version: 1.0 X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org This patch implements the libgomp runtime support for the dynamic target_device selector via the GOMP_evaluate_target_device function. include/ChangeLog * cuda/cuda.h (CUdevice_attribute): Add definitions for CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR and CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR. libgomp/ChangeLog * Makefile.am (libgomp_la_SOURCES): Add selector.c. * Makefile.in: Regenerate. * config/gcn/selector.c: New. * config/linux/selector.c: New. * config/linux/x86/selector.c: New. * config/nvptx/selector.c: New. * libgomp-plugin.h (GOMP_OFFLOAD_evaluate_device): New. * libgomp.h (struct gomp_device_descr): Add evaluate_device_func field. * libgomp.map (GOMP_5.1.3): New, add GOMP_evaluate_target_device. * libgomp.texi (OpenMP Context Selectors): Document dynamic selector matching of kind/arch/isa. * libgomp_g.h (GOMP_evaluate_current_device): New. (GOMP_evaluate_target_device): New. * oacc-host.c (host_evaluate_device): New. (host_openacc_exec): Initialize evaluate_device_func field to host_evaluate_device. * plugin/plugin-gcn.c (gomp_match_selectors): New. (gomp_match_isa): New. (GOMP_OFFLOAD_evaluate_device): New. * plugin/plugin-nvptx.c (struct ptx_device): Add compute_major and compute_minor fields. (nvptx_open_device): Read compute capability information from device. (gomp_match_selectors): New. (gomp_match_selector): New. (CHECK_ISA): New macro. (GOMP_OFFLOAD_evaluate_device): New. * selector.c: New. * target.c (GOMP_evaluate_target_device): New. (gomp_load_plugin_for_device): Load evaluate_device plugin function. Co-Authored-By: Kwok Cheung Yeung Co-Authored-By: Sandra Loosemore --- include/cuda/cuda.h | 2 + libgomp/Makefile.am | 2 +- libgomp/Makefile.in | 5 +- libgomp/config/gcn/selector.c | 102 +++++++ libgomp/config/linux/selector.c | 65 +++++ libgomp/config/linux/x86/selector.c | 406 ++++++++++++++++++++++++++++ libgomp/config/nvptx/selector.c | 77 ++++++ libgomp/libgomp-plugin.h | 2 + libgomp/libgomp.h | 1 + libgomp/libgomp.map | 5 + libgomp/libgomp.texi | 18 +- libgomp/libgomp_g.h | 8 + libgomp/oacc-host.c | 11 + libgomp/plugin/plugin-gcn.c | 52 ++++ libgomp/plugin/plugin-nvptx.c | 82 ++++++ libgomp/selector.c | 64 +++++ libgomp/target.c | 40 +++ 17 files changed, 936 insertions(+), 6 deletions(-) create mode 100644 libgomp/config/gcn/selector.c create mode 100644 libgomp/config/linux/selector.c create mode 100644 libgomp/config/linux/x86/selector.c create mode 100644 libgomp/config/nvptx/selector.c create mode 100644 libgomp/selector.c diff --git a/include/cuda/cuda.h b/include/cuda/cuda.h index 0dca4b3a5c0..a775450df03 100644 --- a/include/cuda/cuda.h +++ b/include/cuda/cuda.h @@ -83,6 +83,8 @@ typedef enum { CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_MULTIPROCESSOR = 39, CU_DEVICE_ATTRIBUTE_ASYNC_ENGINE_COUNT = 40, CU_DEVICE_ATTRIBUTE_UNIFIED_ADDRESSING = 41, + CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR = 75, + CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR = 76, CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR = 82 } CUdevice_attribute; diff --git a/libgomp/Makefile.am b/libgomp/Makefile.am index 1871590596d..87658da2d5d 100644 --- a/libgomp/Makefile.am +++ b/libgomp/Makefile.am @@ -72,7 +72,7 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c env.c error.c \ target.c splay-tree.c libgomp-plugin.c oacc-parallel.c oacc-host.c \ oacc-init.c oacc-mem.c oacc-async.c oacc-plugin.c oacc-cuda.c \ priority_queue.c affinity-fmt.c teams.c allocator.c oacc-profiling.c \ - oacc-target.c target-indirect.c + oacc-target.c target-indirect.c selector.c include $(top_srcdir)/plugin/Makefrag.am diff --git a/libgomp/Makefile.in b/libgomp/Makefile.in index 11480d6a953..30e57571404 100644 --- a/libgomp/Makefile.in +++ b/libgomp/Makefile.in @@ -219,7 +219,7 @@ am_libgomp_la_OBJECTS = alloc.lo atomic.lo barrier.lo critical.lo \ oacc-parallel.lo oacc-host.lo oacc-init.lo oacc-mem.lo \ oacc-async.lo oacc-plugin.lo oacc-cuda.lo priority_queue.lo \ affinity-fmt.lo teams.lo allocator.lo oacc-profiling.lo \ - oacc-target.lo target-indirect.lo $(am__objects_1) + oacc-target.lo target-indirect.lo selector.lo $(am__objects_1) libgomp_la_OBJECTS = $(am_libgomp_la_OBJECTS) AM_V_P = $(am__v_P_@AM_V@) am__v_P_ = $(am__v_P_@AM_DEFAULT_V@) @@ -552,7 +552,7 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c env.c \ oacc-parallel.c oacc-host.c oacc-init.c oacc-mem.c \ oacc-async.c oacc-plugin.c oacc-cuda.c priority_queue.c \ affinity-fmt.c teams.c allocator.c oacc-profiling.c \ - oacc-target.c target-indirect.c $(am__append_3) + oacc-target.c target-indirect.c selector.c $(am__append_3) # Nvidia PTX OpenACC plugin. @PLUGIN_NVPTX_TRUE@libgomp_plugin_nvptx_version_info = -version-info $(libtool_VERSION) @@ -777,6 +777,7 @@ distclean-compile: @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/ptrlock.Plo@am__quote@ @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/scope.Plo@am__quote@ @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/sections.Plo@am__quote@ +@AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/selector.Plo@am__quote@ @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/sem.Plo@am__quote@ @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/single.Plo@am__quote@ @AMDEP_TRUE@@am__include@ @am__quote@./$(DEPDIR)/splay-tree.Plo@am__quote@ diff --git a/libgomp/config/gcn/selector.c b/libgomp/config/gcn/selector.c new file mode 100644 index 00000000000..7e099a00b97 --- /dev/null +++ b/libgomp/config/gcn/selector.c @@ -0,0 +1,102 @@ +/* Copyright (C) 2022 Free Software Foundation, Inc. + Contributed by Mentor, a Siemens Business. + + This file is part of the GNU Offloading and Multi Processing Library + (libgomp). + + Libgomp is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY + WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS + FOR A PARTICULAR PURPOSE. See the GNU General Public License for + more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +/* This file contains an implementation of GOMP_evaluate_current_device for + an AMD GCN GPU. */ + +#include "libgomp.h" +#include + +/* The selectors are passed as strings, but are actually sets of multiple + trait property names, separated by '\0' and with an extra '\0' at + the end. Match such a string SELECTORS against an array of strings + CHOICES, that is terminated by a null pointer. + matches. */ +static bool +gomp_match_selectors (const char *selectors, const char **choices) +{ + while (*selectors != '\0') + { + bool match = false; + for (int i = 0; !match && choices[i]; i++) + match = !strcmp (selectors, choices[i]); + if (!match) + return false; + selectors += strlen (selectors) + 1; + } + return true; +} + +bool +GOMP_evaluate_current_device (const char *kind, const char *arch, + const char *isa) +{ + static const char *kind_choices[] = { "gpu", "nohost", NULL }; + static const char *arch_choices[] = { "gcn", "amdgcn", NULL }; + static const char *isa_choices[] + = { +#ifdef __fiji__ + "fiji", "gfx803", +#endif +#ifdef __gfx900__ + "gfx900", +#endif +#ifdef __gfx906__ + "gfx906", +#endif +#ifdef __gfx908__ + "gfx908", +#endif +#ifdef __gfx90a__ + "gfx90a", +#endif +#ifdef __gfx90c__ + "gfx90c", +#endif +#ifdef __gfx1030__ + "gfx1030", +#endif +#ifdef __gfx1036__ + "gfx1036", +#endif +#ifdef __gfx1100__ + "gfx1100", +#endif +#ifdef __gfx1103__ + "gfx1103", +#endif + NULL }; + + if (kind && !gomp_match_selectors (kind, kind_choices)) + return false; + + if (arch && !gomp_match_selectors (arch, arch_choices)) + return false; + + if (isa && !gomp_match_selectors (isa, isa_choices)) + return false; + + return true; +} diff --git a/libgomp/config/linux/selector.c b/libgomp/config/linux/selector.c new file mode 100644 index 00000000000..064cb937ecc --- /dev/null +++ b/libgomp/config/linux/selector.c @@ -0,0 +1,65 @@ +/* Copyright (C) 2022 Free Software Foundation, Inc. + Contributed by Mentor, a Siemens Business. + + This file is part of the GNU Offloading and Multi Processing Library + (libgomp). + + Libgomp is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY + WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS + FOR A PARTICULAR PURPOSE. See the GNU General Public License for + more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +/* This file contains a generic implementation of + GOMP_evaluate_current_device when run on a Linux host. */ + +#include +#include "libgomp.h" + +/* The selectors are passed as strings, but are actually sets of multiple + trait property names, separated by '\0' and with an extra '\0' at + the end. Match such a string SELECTORS against an array of strings + CHOICES, that is terminated by a null pointer. + matches. */ +static bool +gomp_match_selectors (const char *selectors, const char **choices) +{ + while (*selectors != '\0') + { + bool match = false; + for (int i = 0; !match && choices[i]; i++) + match = !strcmp (selectors, choices[i]); + if (!match) + return false; + selectors += strlen (selectors) + 1; + } + return true; +} + +bool +GOMP_evaluate_current_device (const char *kind, const char *arch, + const char *isa) +{ + static const char *kind_choices[] = { "cpu", "host", NULL }; + + if (kind && !gomp_match_selectors (kind, kind_choices)) + return false; + + if (arch || isa) + return false; + + return true; +} diff --git a/libgomp/config/linux/x86/selector.c b/libgomp/config/linux/x86/selector.c new file mode 100644 index 00000000000..13cd2e14389 --- /dev/null +++ b/libgomp/config/linux/x86/selector.c @@ -0,0 +1,406 @@ +/* Copyright (C) 2022 Free Software Foundation, Inc. + Contributed by Mentor, a Siemens Business. + + This file is part of the GNU Offloading and Multi Processing Library + (libgomp). + + Libgomp is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY + WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS + FOR A PARTICULAR PURPOSE. See the GNU General Public License for + more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +/* This file contains an implementation of GOMP_evaluate_current_device for + an x86/x64-based Linux host. */ + +#include +#include "libgomp.h" + +/* The selectors are passed as strings, but are actually sets of multiple + trait property names, separated by '\0' and with an extra '\0' at + the end. Match such a string SELECTORS against an array of strings + CHOICES, that is terminated by a null pointer. + matches. */ +static bool +gomp_match_selectors (const char *selectors, const char **choices) +{ + while (*selectors != '\0') + { + bool match = false; + for (int i = 0; !match && choices[i]; i++) + match = !strcmp (selectors, choices[i]); + if (!match) + return false; + selectors += strlen (selectors) + 1; + } + return true; +} + +bool +GOMP_evaluate_current_device (const char *kind, const char *arch, + const char *isa) +{ + static const char *kind_choices[] = { "cpu", "host", NULL }; + + static const char *arch_choices[] + = { "x86", + "ia32", +#ifdef __x86_64__ + "x86_64", +#endif +#ifdef __ILP32__ + "x32", +#endif + "i386", +#ifdef __i486__ + "i486", +#endif +#ifdef __i586__ + "i586", +#endif +#ifdef __i686__ + "i686", +#endif + NULL }; + + static const char *isa_choices[] + = { +#ifdef __WBNOINVD__ + "wbnoinvd", +#endif +#ifdef __AVX512VP2INTERSECT__ + "avx512vp2intersect", +#endif +#ifdef __MMX__ + "mmx", +#endif +#ifdef __3dNOW__ + "3dnow", +#endif +#ifdef __3dNOW_A__ + "3dnowa", +#endif +#ifdef __SSE__ + "sse", +#endif +#ifdef __SSE2__ + "sse2", +#endif +#ifdef __SSE3__ + "sse3", +#endif +#ifdef __SSSE3__ + "ssse3", +#endif +#ifdef __SSE4_1__ + "sse4.1", +#endif +#ifdef __SSE4_2__ + "sse4", + "sse4.2", +#endif +#ifdef __AES__ + "aes", +#endif +#ifdef __SHA__ + "sha", +#endif +#ifdef __PCLMUL__ + "pclmul", +#endif +#ifdef __AVX__ + "avx", +#endif +#ifdef __AVX2__ + "avx2", +#endif +#ifdef __AVX512F__ + "avx512f", +#endif +#ifdef __AVX512ER__ + "avx512er", +#endif +#ifdef __AVX512CD__ + "avx512cd", +#endif +#ifdef __AVX512PF__ + "avx512pf", +#endif +#ifdef __AVX512DQ__ + "avx512dq", +#endif +#ifdef __AVX512BW__ + "avx512bw", +#endif +#ifdef __AVX512VL__ + "avx512vl", +#endif +#ifdef __AVX512VBMI__ + "avx512vbmi", +#endif +#ifdef __AVX512IFMA__ + "avx512ifma", +#endif +#ifdef __AVX5124VNNIW__ + "avx5124vnniw", +#endif +#ifdef __AVX512VBMI2__ + "avx512vbmi2", +#endif +#ifdef __AVX512VNNI__ + "avx512vnni", +#endif +#ifdef __PCONFIG__ + "pconfig", +#endif +#ifdef __SGX__ + "sgx", +#endif +#ifdef __AVX5124FMAPS__ + "avx5124fmaps", +#endif +#ifdef __AVX512BITALG__ + "avx512bitalg", +#endif +#ifdef __AVX512VPOPCNTDQ__ + "avx512vpopcntdq", +#endif +#ifdef __FMA__ + "fma", +#endif +#ifdef __RTM__ + "rtm", +#endif +#ifdef __SSE4A__ + "sse4a", +#endif +#ifdef __FMA4__ + "fma4", +#endif +#ifdef __XOP__ + "xop", +#endif +#ifdef __LWP__ + "lwp", +#endif +#ifdef __ABM__ + "abm", +#endif +#ifdef __BMI__ + "bmi", +#endif +#ifdef __BMI2__ + "bmi2", +#endif +#ifdef __LZCNT__ + "lzcnt", +#endif +#ifdef __TBM__ + "tbm", +#endif +#ifdef __CRC32__ + "crc32", +#endif +#ifdef __POPCNT__ + "popcnt", +#endif +#ifdef __FSGSBASE__ + "fsgsbase", +#endif +#ifdef __RDRND__ + "rdrnd", +#endif +#ifdef __F16C__ + "f16c", +#endif +#ifdef __RDSEED__ + "rdseed", +#endif +#ifdef __PRFCHW__ + "prfchw", +#endif +#ifdef __ADX__ + "adx", +#endif +#ifdef __FXSR__ + "fxsr", +#endif +#ifdef __XSAVE__ + "xsave", +#endif +#ifdef __XSAVEOPT__ + "xsaveopt", +#endif +#ifdef __PREFETCHWT1__ + "prefetchwt1", +#endif +#ifdef __CLFLUSHOPT__ + "clflushopt", +#endif +#ifdef __CLZERO__ + "clzero", +#endif +#ifdef __XSAVEC__ + "xsavec", +#endif +#ifdef __XSAVES__ + "xsaves", +#endif +#ifdef __CLWB__ + "clwb", +#endif +#ifdef __MWAITX__ + "mwaitx", +#endif +#ifdef __PKU__ + "pku", +#endif +#ifdef __RDPID__ + "rdpid", +#endif +#ifdef __GFNI__ + "gfni", +#endif +#ifdef __SHSTK__ + "shstk", +#endif +#ifdef __VAES__ + "vaes", +#endif +#ifdef __VPCLMULQDQ__ + "vpclmulqdq", +#endif +#ifdef __MOVDIRI__ + "movdiri", +#endif +#ifdef __MOVDIR64B__ + "movdir64b", +#endif +#ifdef __WAITPKG__ + "waitpkg", +#endif +#ifdef __CLDEMOTE__ + "cldemote", +#endif +#ifdef __SERIALIZE__ + "serialize", +#endif +#ifdef __PTWRITE__ + "ptwrite", +#endif +#ifdef __AVX512BF16__ + "avx512bf16", +#endif +#ifdef __AVX512FP16__ + "avx512fp16", +#endif +#ifdef __ENQCMD__ + "enqcmd", +#endif +#ifdef __TSXLDTRK__ + "tsxldtrk", +#endif +#ifdef __AMX_TILE__ + "amx-tile", +#endif +#ifdef __AMX_INT8__ + "amx-int8", +#endif +#ifdef __AMX_BF16__ + "amx-bf16", +#endif +#ifdef __LAHF_SAHF__ + "sahf", +#endif +#ifdef __MOVBE__ + "movbe", +#endif +#ifdef __UINTR__ + "uintr", +#endif +#ifdef __HRESET__ + "hreset", +#endif +#ifdef __KL__ + "kl", +#endif +#ifdef __WIDEKL__ + "widekl", +#endif +#ifdef __AVXVNNI__ + "avxvnni", +#endif +#ifdef __AVXIFMA_ + "avxifma",_ +#endif +#ifdef __AVXVNNIINT8__ + "avxvnniint8", +#endif +#ifdef __AVXNECONVERT__ + "avxneconvert", +#endif +#ifdef __CMPCCXADD__ + "cmpccxadd", +#endif +#ifdef __AMX_FP16__ + "amx-fp16", +#endif +#ifdef __PREFETCHI__ + "prefetchi", +#endif +#ifdef __RAOINT__ + "raoint", +#endif +#ifdef __AMX_COMPLEX__ + "amx-complex", +#endif +#ifdef __AVXVNNIINT16__ + "amxvnniint16", +#endif +#ifdef __SM3__ + "sm3", +#endif +#ifdef __SHA512__ + "sha512", +#endif +#ifdef __SM4__ + "sm4", +#endif +#ifdef __EVEX512__ + "evex512", +#endif +#ifdef __USER_MSR__ + "usermsr", +#endif +#ifdef __AVX10_1_256__ + "avx10.1-256", +#endif +#ifdef __AVX10_1_512__ + "avx10.1-512", +#endif +#ifdef __APX_F__ + "apxf", +#endif + NULL }; + + if (kind && !gomp_match_selectors (kind, kind_choices)) + return false; + if (arch && !gomp_match_selectors (arch, arch_choices)) + return false; + if (isa && !gomp_match_selectors (isa, isa_choices)) + return false; + return true; +} diff --git a/libgomp/config/nvptx/selector.c b/libgomp/config/nvptx/selector.c new file mode 100644 index 00000000000..c1e81efca28 --- /dev/null +++ b/libgomp/config/nvptx/selector.c @@ -0,0 +1,77 @@ +/* Copyright (C) 2022 Free Software Foundation, Inc. + Contributed by Mentor, a Siemens Business. + + This file is part of the GNU Offloading and Multi Processing Library + (libgomp). + + Libgomp is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY + WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS + FOR A PARTICULAR PURPOSE. See the GNU General Public License for + more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +/* This file contains an implementation of GOMP_evaluate_current_device for + a Nvidia GPU. */ + +#include "libgomp.h" +#include + +static bool +gomp_match_selectors (const char *selectors, const char **choices) +{ + while (*selectors != '\0') + { + bool match = false; + for (int i = 0; !match && choices[i]; i++) + match = !strcmp (selectors, choices[i]); + if (!match) + return false; + selectors += strlen (selectors) + 1; + } + return true; +} + +bool +GOMP_evaluate_current_device (const char *kind, const char *arch, + const char *isa) +{ + static const char *kind_choices[] = { "gpu", "nohost", NULL }; + static const char *arch_choices[] = { "nvptx", NULL }; + static const char *isa_choices[] + = { + "sm_30", +#if __PTX_SM__ >= 350 + "sm_35", +#endif +#if __PTX_SM__ >= 530 + "sm_53", +#endif +#if __PTX_SM__ >= 750 + "sm_75", +#endif +#if __PTX_SM__ >= 800 + "sm_80", +#endif + NULL }; + + if (kind && !gomp_match_selectors (kind, kind_choices)) + return false; + if (arch && !gomp_match_selectors (arch, arch_choices)) + return false; + if (isa && !gomp_match_selectors (isa, isa_choices)) + return false; + return true; +} diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h index 0c9c28c65cf..73f880ffa2f 100644 --- a/libgomp/libgomp-plugin.h +++ b/libgomp/libgomp-plugin.h @@ -152,6 +152,8 @@ extern int GOMP_OFFLOAD_memcpy3d (int, int, size_t, size_t, size_t, void *, extern bool GOMP_OFFLOAD_can_run (void *); extern void GOMP_OFFLOAD_run (int, void *, void *, void **); extern void GOMP_OFFLOAD_async_run (int, void *, void *, void **, void *); +extern bool GOMP_OFFLOAD_evaluate_device (int, const char *, const char *, + const char *); extern void GOMP_OFFLOAD_openacc_exec (void (*) (void *), size_t, void **, void **, unsigned *, void *); diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h index 089393846d1..4dad4bc321a 100644 --- a/libgomp/libgomp.h +++ b/libgomp/libgomp.h @@ -1417,6 +1417,7 @@ struct gomp_device_descr __typeof (GOMP_OFFLOAD_can_run) *can_run_func; __typeof (GOMP_OFFLOAD_run) *run_func; __typeof (GOMP_OFFLOAD_async_run) *async_run_func; + __typeof (GOMP_OFFLOAD_evaluate_device) *evaluate_device_func; /* Splay tree containing information about mapped memory regions. */ struct splay_tree_s mem_map; diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map index 65901dff235..70a48874417 100644 --- a/libgomp/libgomp.map +++ b/libgomp/libgomp.map @@ -428,6 +428,11 @@ GOMP_5.1.2 { GOMP_target_map_indirect_ptr; } GOMP_5.1.1; +GOMP_5.1.3 { + global: + GOMP_evaluate_target_device; +} GOMP_5.1.2; + OACC_2.0 { global: acc_get_num_devices; diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi index 71d62105a20..43048da4d6e 100644 --- a/libgomp/libgomp.texi +++ b/libgomp/libgomp.texi @@ -6181,9 +6181,10 @@ smaller number. On non-host devices, the value of the @c has to be implemented; cf. also PR target/105640. @c For offload devices, add *additionally* gcc/config/*/t-omp-device. -For the host compiler, @code{kind} always matches @code{host}; for the -offloading architectures AMD GCN and Nvidia PTX, @code{kind} always matches -@code{gpu}. For the x86 family of computers, AMD GCN and Nvidia PTX +For the host compiler, @code{kind} always matches @code{host} and @code{cpu}; +for the offloading architectures AMD GCN and Nvidia PTX, @code{kind} +always matches @code{gpu} and @code{nohost}. +For the x86 family of computers, AMD GCN and Nvidia PTX the following traits are supported in addition; while OpenMP is supported on more architectures, GCC currently does not match any @code{arch} or @code{isa} traits for those. @@ -6200,6 +6201,17 @@ on more architectures, GCC currently does not match any @code{arch} or @tab See @code{-march=} in ``Nvidia PTX Options'' @end multitable +For x86, note that the set of matching @code{arch} and @code{isa} +selectors is determined by command-line options rather than the actual +hardware. This is particularly true of dynamic selectors, which match +the options used to build libgomp rather than the options used to +build user programs (which may also differ between compilation units). + +For the @code{target_device} selector on AMD GCN and Nvidia PTX, +the actual hardware is checked at run time. On AMD GCN, an exact match +of the @code{isa} selector is required, while on Nvidia PTX lower-numbered +revisions also match. + @node Memory allocation @section Memory allocation diff --git a/libgomp/libgomp_g.h b/libgomp/libgomp_g.h index c0cc03ae61f..e9d60238e2b 100644 --- a/libgomp/libgomp_g.h +++ b/libgomp/libgomp_g.h @@ -337,6 +337,11 @@ extern void GOMP_single_copy_end (void *); extern void GOMP_scope_start (uintptr_t *); +/* selector.c */ + +extern bool GOMP_evaluate_current_device (const char *, const char *, + const char *); + /* target.c */ extern void GOMP_target (int, void (*) (void *), const void *, @@ -359,6 +364,9 @@ extern void GOMP_teams (unsigned int, unsigned int); extern bool GOMP_teams4 (unsigned int, unsigned int, unsigned int, bool); extern void *GOMP_target_map_indirect_ptr (void *); +extern bool GOMP_evaluate_target_device (int, const char *, const char *, + const char *); + /* teams.c */ extern void GOMP_teams_reg (void (*) (void *), void *, unsigned, unsigned, diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c index 5efdf7fb796..b6883850250 100644 --- a/libgomp/oacc-host.c +++ b/libgomp/oacc-host.c @@ -136,6 +136,16 @@ host_run (int n __attribute__ ((unused)), void *fn_ptr, void *vars, fn (vars); } +static bool +host_evaluate_device (int device_num __attribute__ ((unused)), + const char *kind __attribute__ ((unused)), + const char *arch __attribute__ ((unused)), + const char *isa __attribute__ ((unused))) +{ + __builtin_unreachable (); + return false; +} + static void host_openacc_exec (void (*fn) (void *), size_t mapnum __attribute__ ((unused)), @@ -285,6 +295,7 @@ static struct gomp_device_descr host_dispatch = .memcpy2d_func = NULL, .memcpy3d_func = NULL, .run_func = host_run, + .evaluate_device_func = host_evaluate_device, .mem_map = { NULL }, .mem_map_rev = { NULL }, diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c index 3cdc7ba929f..9d9f34cf767 100644 --- a/libgomp/plugin/plugin-gcn.c +++ b/libgomp/plugin/plugin-gcn.c @@ -4395,6 +4395,58 @@ GOMP_OFFLOAD_async_run (int device, void *tgt_fn, void *tgt_vars, GOMP_PLUGIN_target_task_completion, async_data); } +/* The selectors are passed as strings, but are actually sets of multiple + trait property names, separated by '\0' and with an extra '\0' at + the end. Match such a string SELECTORS against an array of strings + CHOICES, that is terminated by a null pointer. + matches. */ +static bool +gomp_match_selectors (const char *selectors, const char **choices) +{ + while (*selectors != '\0') + { + bool match = false; + for (int i = 0; !match && choices[i]; i++) + match = !strcmp (selectors, choices[i]); + if (!match) + return false; + selectors += strlen (selectors) + 1; + } + return true; +} + +/* Here we can only have one possible match and it must be + the only selector provided. */ +static bool +gomp_match_isa (const char *selectors, gcn_isa isa) +{ + if (isa_code (selectors) != isa) + return false; + if (*(selectors + strlen (selectors) + 1) != '\0') + return false; + return true; +} + +bool +GOMP_OFFLOAD_evaluate_device (int device_num, const char *kind, + const char *arch, const char *isa) +{ + static const char *kind_choices[] = { "gpu", "nohost", NULL }; + static const char *arch_choices[] = { "gcn", "amdgcn", NULL }; + struct agent_info *agent = get_agent_info (device_num); + + if (kind && !gomp_match_selectors (kind, kind_choices)) + return false; + + if (arch && !gomp_match_selectors (arch, arch_choices)) + return false; + + if (isa && !gomp_match_isa (isa, agent->device_isa)) + return false; + + return true; +} + /* }}} */ /* {{{ OpenACC Plugin API */ diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c index 5aad3448a8d..5126720eb5c 100644 --- a/libgomp/plugin/plugin-nvptx.c +++ b/libgomp/plugin/plugin-nvptx.c @@ -317,6 +317,7 @@ struct ptx_device int max_threads_per_block; int max_threads_per_multiprocessor; int default_dims[GOMP_DIM_MAX]; + int compute_major, compute_minor; /* Length as used by the CUDA Runtime API ('struct cudaDeviceProp'). */ char name[256]; @@ -541,6 +542,14 @@ nvptx_open_device (int n) for (int i = 0; i != GOMP_DIM_MAX; i++) ptx_dev->default_dims[i] = 0; + CUDA_CALL_ERET (NULL, cuDeviceGetAttribute, &pi, + CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR, dev); + ptx_dev->compute_major = pi; + + CUDA_CALL_ERET (NULL, cuDeviceGetAttribute, &pi, + CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR, dev); + ptx_dev->compute_minor = pi; + CUDA_CALL_ERET (NULL, cuDeviceGetName, ptx_dev->name, sizeof ptx_dev->name, dev); @@ -2314,3 +2323,76 @@ GOMP_OFFLOAD_run (int ord, void *tgt_fn, void *tgt_vars, void **args) } /* TODO: Implement GOMP_OFFLOAD_async_run. */ + +/* The selectors are passed as strings, but are actually sets of multiple + trait property names, separated by '\0' and with an extra '\0' at + the end. Match such a string SELECTORS against an array of strings + CHOICES, that is terminated by a null pointer. + matches. */ +static bool +gomp_match_selectors (const char *selectors, const char **choices) +{ + while (*selectors != '\0') + { + bool match = false; + for (int i = 0; !match && choices[i]; i++) + match = !strcmp (selectors, choices[i]); + if (!match) + return false; + selectors += strlen (selectors) + 1; + } + return true; +} + +/* Here we can only have one possible match and it must be + the only selector provided. */ +static bool +gomp_match_selector (const char *selectors, const char *choice) +{ + if (!strcmp (selectors, choice)) + return false; + if (*(selectors + strlen (selectors) + 1) != '\0') + return false; + return true; +} + +#define CHECK_ISA(major, minor) \ + if (device->compute_major >= major \ + && device->compute_minor >= minor \ + && gomp_match_selector (isa, "sm_"#major#minor)) \ + return true + +bool +GOMP_OFFLOAD_evaluate_device (int device_num, const char *kind, + const char *arch, const char *isa) +{ + static const char *kind_choices[] = { "gpu", "nohost", NULL }; + static const char *arch_choices[] = { "nvptx", NULL }; + if (kind && !gomp_match_selectors (kind, kind_choices)) + return false; + + if (arch && !gomp_match_selectors (arch, arch_choices)) + return false; + + if (!isa) + return true; + + struct ptx_device *device = ptx_devices[device_num]; + + CHECK_ISA (3, 0); + CHECK_ISA (3, 5); + CHECK_ISA (3, 7); + CHECK_ISA (5, 0); + CHECK_ISA (5, 2); + CHECK_ISA (5, 3); + CHECK_ISA (6, 0); + CHECK_ISA (6, 1); + CHECK_ISA (6, 2); + CHECK_ISA (7, 0); + CHECK_ISA (7, 2); + CHECK_ISA (7, 5); + CHECK_ISA (8, 0); + CHECK_ISA (8, 6); + + return false; +} diff --git a/libgomp/selector.c b/libgomp/selector.c new file mode 100644 index 00000000000..5b21e582844 --- /dev/null +++ b/libgomp/selector.c @@ -0,0 +1,64 @@ +/* Copyright (C) 2022 Free Software Foundation, Inc. + Contributed by Mentor, a Siemens Business. + + This file is part of the GNU Offloading and Multi Processing Library + (libgomp). + + Libgomp is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY + WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS + FOR A PARTICULAR PURPOSE. See the GNU General Public License for + more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +/* This file contains a placeholder implementation of + GOMP_evaluate_current_device. */ + +#include "libgomp.h" + +/* The selectors are passed as strings, but are actually sets of multiple + trait property names, separated by '\0' and with an extra '\0' at + the end. Match such a string SELECTORS against an array of strings + CHOICES, that is terminated by a null pointer. + matches. */ +static bool +gomp_match_selectors (const char *selectors, const char **choices) +{ + while (*selectors != '\0') + { + bool match = false; + for (int i = 0; !match && choices[i]; i++) + match = !strcmp (selectors, choices[i]); + if (!match) + return false; + selectors += strlen (selectors) + 1; + } + return true; +} + +bool +GOMP_evaluate_current_device (const char *kind, const char *arch, + const char *isa) +{ + static const char *kind_choices[] = { "cpu", "host", NULL }; + + if (kind && !gomp_match_selectors (kind, kind_choices)) + return false; + + if (arch || isa) + return false; + + return true; +} diff --git a/libgomp/target.c b/libgomp/target.c index 5ec19ae489e..02973ee2f40 100644 --- a/libgomp/target.c +++ b/libgomp/target.c @@ -5092,6 +5092,45 @@ omp_pause_resource_all (omp_pause_resource_t kind) ialias (omp_pause_resource) ialias (omp_pause_resource_all) +bool +GOMP_evaluate_target_device (int device_num, const char *kind, + const char *arch, const char *isa) +{ + bool result = true; + + /* -2 is a magic number to indicate the device number was not specified; + in that case it's supposed to use the default device. */ + if (device_num == -2) + device_num = omp_get_default_device (); + + if (kind && strcmp (kind, "any") == 0) + kind = NULL; + + gomp_debug (1, "%s: device_num = %u, kind=%s, arch=%s, isa=%s", + __FUNCTION__, device_num, kind, arch, isa); + + if (omp_get_device_num () == device_num) + result = GOMP_evaluate_current_device (kind, arch, isa); + else + { + if (!omp_is_initial_device ()) + /* Accelerators are not expected to know about other devices. */ + result = false; + else + { + struct gomp_device_descr *device = resolve_device (device_num, true); + if (device == NULL) + result = false; + else if (device->evaluate_device_func) + result = device->evaluate_device_func (device_num, kind, arch, + isa); + } + } + + gomp_debug (1, " -> %s\n", result ? "true" : "false"); + return result; +} + #ifdef PLUGIN_SUPPORT /* This function tries to load a plugin for DEVICE. Name of plugin is passed @@ -5144,6 +5183,7 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device, DLSYM (free); DLSYM (dev2host); DLSYM (host2dev); + DLSYM (evaluate_device); DLSYM_OPT (memcpy2d, memcpy2d); DLSYM_OPT (memcpy3d, memcpy3d); device->capabilities = device->get_caps_func ();