From patchwork Mon Jan 22 20:33:17 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kwok Cheung Yeung X-Patchwork-Id: 1889385 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4TJhlv3Z2rz1yS7 for ; Tue, 23 Jan 2024 07:34:33 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D1BFB3858414 for ; Mon, 22 Jan 2024 20:34:31 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa4.mentor.iphmx.com (esa4.mentor.iphmx.com [68.232.137.252]) by sourceware.org (Postfix) with ESMTPS id E0FEC3858D3C for ; Mon, 22 Jan 2024 20:34:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E0FEC3858D3C Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org E0FEC3858D3C Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=68.232.137.252 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1705955648; cv=none; b=tlKK8nHmJs0qyYy/8SVi7dOJ5vZrhLfzWOSv44BN+i/WyanX1G6xzN8HUU0fkci+iPjG2BHMdzGOW4soO5P+c8Jjm4USRPedBdxuZ8KSkdECz7L8BiZVv9kaUtYZw1t32/L04WCQAtT8BsdsA1wjsaInWWo+8na36FVu+DoaKH0= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1705955648; c=relaxed/simple; bh=kZ18d2geJ6bjIg+Ak5LlQ2im4RNqq83MT+Q3tO/Cn4o=; h=Message-ID:Date:MIME-Version:Subject:From:To; b=GsUlekfdEHUe+GoZgYRgK7b8jImScGIQOrFkAltHr25hDjXC/v+uE6q+5j45iMBMdGwr/BPmHPN3tGS4XiD6Hw4z24vE6qX1KK5lDJteFmEpGj4dQJCEJ8NO6l0YvP7seTbS41iklEM5AJbCtHGcyKqrnM+KSeNbNBMznfXuzbA= ARC-Authentication-Results: i=1; server2.sourceware.org X-CSE-ConnectionGUID: +QvMkf8YQC2Z+vM3WgLiow== X-CSE-MsgGUID: Cm0bMQFeSB6B8DBt0GAaOg== X-IronPort-AV: E=Sophos;i="6.05,212,1701158400"; d="scan'208,223";a="28591982" Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa4.mentor.iphmx.com with ESMTP; 22 Jan 2024 12:34:03 -0800 IronPort-SDR: 0fYZpPISlcUTAwh5L0Fg3dwmweVh9MVS9NXvnvnVre9t2o2hbzx7gbzbcb5uTsHaIlE63HxO0g 5s00Ai6l9FIpCpogh69XB3MinFMjT8McfCqwpjMJuOOG/0+V7HjUV2w14JSOl3s7xutNd6Gx3L viPRGmRJqZKTAVnM3RnrqqOTyg/OYqh+gai2vy9m9t+AoHHJ84jGfzRlpupM1lffE3jBg+GCSo xMmHd1IIe093wASFnFg5dgcfhG4rW753+kpqUQx76qhtdQtPqgj0pLMHNLNQbcBwqgk4h1N+EG /dQ= Message-ID: <94202e90-0519-4124-9438-3ea40f6145aa@codesourcery.com> Date: Mon, 22 Jan 2024 20:33:17 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird References: <37f412ee-58e7-4bde-a763-591268e8f8f4@codesourcery.com> Subject: [PATCH] openmp: Change to using a hashtab to lookup offload target addresses for indirect function calls Content-Language: en-GB From: Kwok Cheung Yeung To: gcc-patches , Jakub Jelinek , Tobias Burnus CC: In-Reply-To: <37f412ee-58e7-4bde-a763-591268e8f8f4@codesourcery.com> X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-13.mgc.mentorg.com (139.181.222.13) To svr-ies-mbx-12.mgc.mentorg.com (139.181.222.12) X-Spam-Status: No, score=-11.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Hi There was a bug in the declare-target-indirect-2.c libgomp testcase (testing indirect calls in offloaded target regions, spread over multiple teams/threads) that due to an errant fallthrough in a switch statement resulted in only one indirect function ever getting called: switch (i % 3) { case 0: fn_ptr[i] = &foo; // Missing break case 1: fn_ptr[i] = &bar; // Missing break case 2: fn_ptr[i] = &baz; } However, when the missing break statements are added, the testcase fails with an invalid memory access. Upon investigation, this is due to the use of a splay-tree as the lookup structure for indirect addresses, as the splay-tree moves frequently accessed elements closer to the root node and so needs locking when used from multiple threads. However, this would end up partially serialising all the threads and kill performance. I have switched the lookup structure from a splay tree to a hashtab instead to avoid locking during lookup. I have also tidied up the initialisation of the lookup table by calling it only from the first thread of the first team, instead of redundantly calling it from every thread and only having the first one reached do the initialisation. This removes the need for locking during initialisation. Tested with offloading to NVPTX and GCN with a x86_64 host. Okay for master? Thanks Kwok From 721ec33bec2fddc7ee37e227358e36fec923f8da Mon Sep 17 00:00:00 2001 From: Kwok Cheung Yeung Date: Wed, 17 Jan 2024 16:53:40 +0000 Subject: [PATCH 1/2] openmp: Change to using a hashtab to lookup offload target addresses for indirect function calls A splay-tree was previously used to lookup equivalent target addresses for a given host address on offload targets. However, as splay-trees can modify their structure on lookup, they are not suitable for concurrent access from separate teams/threads without some form of locking. This patch changes the lookup data structure to a hashtab instead, which does not have these issues. The call to build_indirect_map to initialize the data structure is now called from just the first thread of the first team to avoid redundant calls to this function. 2024-01-19 Kwok Cheung Yeung libgomp/ * config/accel/target-indirect.c: Include string.h and hashtab.h. Remove include of splay-tree.h. (splay_tree_prefix, splay_tree_c): Delete. (struct indirect_map_t): New. (hash_entry_type, htab_alloc, htab_free, htab_hash, htab_eq): New. (GOMP_INDIRECT_ADD_MAP): Remove volatile qualifier. (USE_SPLAY_TREE_LOOKUP): Rename to... (USE_HASHTAB_LOOKUP): ..this. (indirect_map, indirect_array): Delete. (indirect_htab): New. (build_indirect_map): Remove locking. Build indirect map using hashtab. (GOMP_target_map_indirect_ptr): Use indirect_htab to lookup target address. * config/gcn/team.c (gomp_gcn_enter_kernel): Call build_indirect_map from first thread of first team only. * config/nvptx/team.c (gomp_nvptx_main): Likewise. * testsuite/libgomp.c-c++-common/declare-target-indirect-2.c (main): Add missing break statements. --- libgomp/config/accel/target-indirect.c | 75 +++++++++++-------- libgomp/config/gcn/team.c | 7 +- libgomp/config/nvptx/team.c | 9 ++- .../declare-target-indirect-2.c | 14 ++-- 4 files changed, 59 insertions(+), 46 deletions(-) diff --git a/libgomp/config/accel/target-indirect.c b/libgomp/config/accel/target-indirect.c index c60fd547cb6..6dad85076d6 100644 --- a/libgomp/config/accel/target-indirect.c +++ b/libgomp/config/accel/target-indirect.c @@ -25,22 +25,43 @@ . */ #include +#include #include "libgomp.h" -#define splay_tree_prefix indirect -#define splay_tree_c -#include "splay-tree.h" +struct indirect_map_t +{ + void *host_addr; + void *target_addr; +}; + +typedef struct indirect_map_t *hash_entry_type; + +static inline void * htab_alloc (size_t size) { return gomp_malloc (size); } +static inline void htab_free (void *ptr) { free (ptr); } + +#include "hashtab.h" + +static inline hashval_t +htab_hash (hash_entry_type element) +{ + return hash_pointer (element->host_addr); +} -volatile void **GOMP_INDIRECT_ADDR_MAP = NULL; +static inline bool +htab_eq (hash_entry_type x, hash_entry_type y) +{ + return x->host_addr == y->host_addr; +} + +void **GOMP_INDIRECT_ADDR_MAP = NULL; /* Use a splay tree to lookup the target address instead of using a linear search. */ -#define USE_SPLAY_TREE_LOOKUP +#define USE_HASHTAB_LOOKUP -#ifdef USE_SPLAY_TREE_LOOKUP +#ifdef USE_HASHTAB_LOOKUP -static struct indirect_splay_tree_s indirect_map; -static indirect_splay_tree_node indirect_array = NULL; +static htab_t indirect_htab = NULL; /* Build the splay tree used for host->target address lookups. */ @@ -48,37 +69,29 @@ void build_indirect_map (void) { size_t num_ind_funcs = 0; - volatile void **map_entry; - static int lock = 0; /* == gomp_mutex_t lock; gomp_mutex_init (&lock); */ + void **map_entry; if (!GOMP_INDIRECT_ADDR_MAP) return; - gomp_mutex_lock (&lock); - - if (!indirect_array) + if (!indirect_htab) { /* Count the number of entries in the NULL-terminated address map. */ for (map_entry = GOMP_INDIRECT_ADDR_MAP; *map_entry; map_entry += 2, num_ind_funcs++); - /* Build splay tree for address lookup. */ - indirect_array = gomp_malloc (num_ind_funcs * sizeof (*indirect_array)); - indirect_splay_tree_node array = indirect_array; + /* Build hashtab for address lookup. */ + indirect_htab = htab_create (num_ind_funcs); map_entry = GOMP_INDIRECT_ADDR_MAP; - for (int i = 0; i < num_ind_funcs; i++, array++) + for (int i = 0; i < num_ind_funcs; i++, map_entry += 2) { - indirect_splay_tree_key k = &array->key; - k->host_addr = (uint64_t) *map_entry++; - k->target_addr = (uint64_t) *map_entry++; - array->left = NULL; - array->right = NULL; - indirect_splay_tree_insert (&indirect_map, array); + struct indirect_map_t element = { *map_entry, NULL }; + hash_entry_type *slot = htab_find_slot (&indirect_htab, &element, + INSERT); + *slot = (hash_entry_type) map_entry; } } - - gomp_mutex_unlock (&lock); } void * @@ -88,15 +101,11 @@ GOMP_target_map_indirect_ptr (void *ptr) if (!ptr) return ptr; - assert (indirect_array); - - struct indirect_splay_tree_key_s k; - indirect_splay_tree_key node = NULL; - - k.host_addr = (uint64_t) ptr; - node = indirect_splay_tree_lookup (&indirect_map, &k); + assert (indirect_htab); - return node ? (void *) node->target_addr : ptr; + struct indirect_map_t element = { ptr, NULL }; + hash_entry_type entry = htab_find (indirect_htab, &element); + return entry ? entry->target_addr : ptr; } #else diff --git a/libgomp/config/gcn/team.c b/libgomp/config/gcn/team.c index 61e9c616b67..bd3df448b52 100644 --- a/libgomp/config/gcn/team.c +++ b/libgomp/config/gcn/team.c @@ -52,14 +52,15 @@ gomp_gcn_enter_kernel (void) { int threadid = __builtin_gcn_dim_pos (1); - /* Initialize indirect function support. */ - build_indirect_map (); - if (threadid == 0) { int numthreads = __builtin_gcn_dim_size (1); int teamid = __builtin_gcn_dim_pos(0); + /* Initialize indirect function support. */ + if (teamid == 0) + build_indirect_map (); + /* Set up the global state. Every team will do this, but that should be harmless. */ gomp_global_icv.nthreads_var = 16; diff --git a/libgomp/config/nvptx/team.c b/libgomp/config/nvptx/team.c index 0cf5dad39ca..d5361917a24 100644 --- a/libgomp/config/nvptx/team.c +++ b/libgomp/config/nvptx/team.c @@ -60,9 +60,6 @@ gomp_nvptx_main (void (*fn) (void *), void *fn_data) asm ("mov.u32 %0, %%tid.y;" : "=r" (tid)); asm ("mov.u32 %0, %%ntid.y;" : "=r" (ntids)); - /* Initialize indirect function support. */ - build_indirect_map (); - if (tid == 0) { gomp_global_icv.nthreads_var = ntids; @@ -74,6 +71,12 @@ gomp_nvptx_main (void (*fn) (void *), void *fn_data) nvptx_thrs = alloca (ntids * sizeof (*nvptx_thrs)); memset (nvptx_thrs, 0, ntids * sizeof (*nvptx_thrs)); + /* Initialize indirect function support. */ + unsigned int block_id; + asm ("mov.u32 %0, %%ctaid.x;" : "=r" (block_id)); + if (block_id == 0) + build_indirect_map (); + /* Find the low-latency heap details .... */ uint32_t *shared_pool; uint32_t shared_pool_size = 0; diff --git a/libgomp/testsuite/libgomp.c-c++-common/declare-target-indirect-2.c b/libgomp/testsuite/libgomp.c-c++-common/declare-target-indirect-2.c index 9fe190efce8..545f1a9fcbf 100644 --- a/libgomp/testsuite/libgomp.c-c++-common/declare-target-indirect-2.c +++ b/libgomp/testsuite/libgomp.c-c++-common/declare-target-indirect-2.c @@ -17,17 +17,17 @@ int main (void) { switch (i % 3) { - case 0: fn_ptr[i] = &foo; - case 1: fn_ptr[i] = &bar; - case 2: fn_ptr[i] = &baz; + case 0: fn_ptr[i] = &foo; break; + case 1: fn_ptr[i] = &bar; break; + case 2: fn_ptr[i] = &baz; break; } expected += (*fn_ptr[i]) (); } -#pragma omp target teams distribute parallel for reduction(+: x) \ - map (to: fn_ptr) map (tofrom: x) - for (int i = 0; i < N; i++) - x += (*fn_ptr[i]) (); + #pragma omp target teams distribute parallel for \ + reduction (+: x) map (to: fn_ptr) map (tofrom: x) + for (int i = 0; i < N; i++) + x += (*fn_ptr[i]) (); return x - expected; }