From patchwork Mon Oct 24 16:26:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Stubbs X-Patchwork-Id: 1693972 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Mx0pk3KfNz23l5 for ; Tue, 25 Oct 2022 03:27:25 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 81DEB3858025 for ; Mon, 24 Oct 2022 16:27:18 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98]) by sourceware.org (Postfix) with ESMTPS id 116DA385734B for ; Mon, 24 Oct 2022 16:26:49 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 116DA385734B Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.95,209,1661846400"; d="scan'208";a="85427795" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa2.mentor.iphmx.com with ESMTP; 24 Oct 2022 08:26:48 -0800 IronPort-SDR: l7RTH5wmOn4dos2OdAk+V+xBS9VgO8lNZCTZQ9MovnhVtrpkToVhvdAL5BKf8ZTUg9NzCeXtq/ fKV/5A5Ci9Mzv1eYx1j67IbbJWeSkdmJOw73Ft+xXQf3/9cO+2x9b+XoUVMv3RgkAL3ejYJekW Pafidh5qAvUnjw15nvXx3oBMluF/tOCKwFayIVbxk3Pcv+kmC3bxypLjbPJQzHD5waxM7wHGJX mEPqwyX1GtBUcFOD64yz9z5d7GOoowOSLINBcE3ULcGvW+ArMxkAhI/lVO04z2wgoBqqg9U9b0 Rng= Message-ID: <7bb722dc-0e73-dce2-d05f-d471663366a4@codesourcery.com> Date: Mon, 24 Oct 2022 17:26:44 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.3.3 Content-Language: en-GB To: "gcc-patches@gcc.gnu.org" From: Andrew Stubbs Subject: [OG12 commit] amdgcn, libgomp: USM allocation update X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: SVR-IES-MBX-07.mgc.mentorg.com (139.181.222.7) To svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" I've committed this patch to the devel/omp/gcc-12 branch. I will have to fold it into my previous OpenMP memory management patch series when I repost it. The patch changes the internal memory allocation method such that memory is allocated in the regular heap and then marked as "coarse-grained", as opposed to allocating coarse-grained memory in the first place. The difference is that this is CPU first, not GPU first, which is typically the right way around, especially when we are using this for all possible allocations. Andrew amdgcn, libgomp: USM allocation update Allocate Unified Shared Memory via malloc and hsa_amd_svm_attributes_set, instead of hsa_allocate_memory. This scheme should be more efficient for for memory that is first accessed by the CPU. libgomp/ChangeLog: * plugin/plugin-gcn.c (HSA_AMD_SYSTEM_INFO_SVM_SUPPORTED): New. (HSA_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT): New. (HSA_AMD_SVM_ATTRIB_GLOBAL_FLAG): New. (HSA_AMD_SVM_GLOBAL_FLAG_COARSE_GRAINED): New. (hsa_amd_svm_attribute_pair_t): New. (struct hsa_runtime_fn_info): Add hsa_amd_svm_attributes_set_fn. (dump_hsa_system_info): Dump HSA_AMD_SYSTEM_INFO_SVM_SUPPORTED and HSA_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT. (DLSYM_OPT_FN): New. (init_hsa_runtime_functions): Add hsa_amd_svm_attributes_set. (GOMP_OFFLOAD_usm_alloc): Use malloc and hsa_amd_svm_attributes_set. (GOMP_OFFLOAD_usm_free): Use regular free. * testsuite/libgomp.c/usm-1.c: Add -mxnack=on for amdgcn. * testsuite/libgomp.c/usm-2.c: Likewise. * testsuite/libgomp.c/usm-3.c: Likewise. * testsuite/libgomp.c/usm-4.c: Likewise. diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c index dd493f63912..4871a6a793b 100644 --- a/libgomp/plugin/plugin-gcn.c +++ b/libgomp/plugin/plugin-gcn.c @@ -113,6 +113,16 @@ struct gcn_thread int async; }; +/* TEMPORARY IMPORT, UNTIL hsa_ext_amd.h GETS UPDATED. */ +const static int HSA_AMD_SYSTEM_INFO_SVM_SUPPORTED = 0x201; +const static int HSA_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT = 0x202; +const static int HSA_AMD_SVM_ATTRIB_GLOBAL_FLAG = 0; +const static int HSA_AMD_SVM_GLOBAL_FLAG_COARSE_GRAINED = 1; +typedef struct hsa_amd_svm_attribute_pair_s { + uint64_t attribute; + uint64_t value; +} hsa_amd_svm_attribute_pair_t; + /* As an HSA runtime is dlopened, following structure defines function pointers utilized by the HSA plug-in. */ @@ -195,6 +205,9 @@ struct hsa_runtime_fn_info hsa_status_t (*hsa_code_object_deserialize_fn) (void *serialized_code_object, size_t serialized_code_object_size, const char *options, hsa_code_object_t *code_object); + hsa_status_t (*hsa_amd_svm_attributes_set_fn) + (void* ptr, size_t size, hsa_amd_svm_attribute_pair_t* attribute_list, + size_t attribute_count); }; /* Structure describing the run-time and grid properties of an HSA kernel @@ -720,6 +733,24 @@ dump_hsa_system_info (void) } else GCN_WARNING ("HSA_SYSTEM_INFO_EXTENSIONS: FAILED\n"); + + bool svm_supported; + status = hsa_fns.hsa_system_get_info_fn + (HSA_AMD_SYSTEM_INFO_SVM_SUPPORTED, &svm_supported); + if (status == HSA_STATUS_SUCCESS) + GCN_DEBUG ("HSA_AMD_SYSTEM_INFO_SVM_SUPPORTED: %s\n", + (svm_supported ? "TRUE" : "FALSE")); + else + GCN_WARNING ("HSA_AMD_SYSTEM_INFO_SVM_SUPPORTED: FAILED\n"); + + bool svm_accessible; + status = hsa_fns.hsa_system_get_info_fn + (HSA_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT, &svm_accessible); + if (status == HSA_STATUS_SUCCESS) + GCN_DEBUG ("HSA_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT: %s\n", + (svm_accessible ? "TRUE" : "FALSE")); + else + GCN_WARNING ("HSA_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT: FAILED\n"); } /* Dump information about the available hardware. */ @@ -1361,6 +1392,8 @@ init_hsa_runtime_functions (void) hsa_fns.function##_fn = dlsym (handle, #function); \ if (hsa_fns.function##_fn == NULL) \ return false; +#define DLSYM_OPT_FN(function) \ + hsa_fns.function##_fn = dlsym (handle, #function); void *handle = dlopen (hsa_runtime_lib, RTLD_LAZY); if (handle == NULL) return false; @@ -1395,6 +1428,7 @@ init_hsa_runtime_functions (void) DLSYM_FN (hsa_signal_load_acquire) DLSYM_FN (hsa_queue_destroy) DLSYM_FN (hsa_code_object_deserialize) + DLSYM_OPT_FN (hsa_amd_svm_attributes_set) return true; #undef DLSYM_FN } @@ -3886,15 +3920,38 @@ static struct usm_splay_tree_s usm_map = { NULL }; /* Allocate memory suitable for Unified Shared Memory. - In fact, AMD memory need only be "coarse grained", which target - allocations already are. We do need to track allocations so that - GOMP_OFFLOAD_is_usm_ptr can look them up. */ + Normal heap memory is already enabled for USM, but by default it is "fine- + grained" memory, meaning that the GPU must access it via the system bus, + slowly. Changing the page to "coarse-grained" mode means that the page + is migrated on-demand and can therefore be accessed quickly by both CPU and + GPU (although care should be taken to prevent thrashing the page back and + forth). + + GOMP_OFFLOAD_alloc also allocates coarse-grained memory, but in that case + the initial location is GPU memory; this function returns system memory. + + We record and track allocations so that GOMP_OFFLOAD_is_usm_ptr can look + them up. */ void * GOMP_OFFLOAD_usm_alloc (int device, size_t size) { - void *ptr = GOMP_OFFLOAD_alloc (device, size); + void *ptr = malloc (size); + if (!ptr || !hsa_fns.hsa_amd_svm_attributes_set_fn) + return ptr; + + /* Register the heap allocation as coarse grained, which implies USM. */ + struct hsa_amd_svm_attribute_pair_s attr = { + HSA_AMD_SVM_ATTRIB_GLOBAL_FLAG, + HSA_AMD_SVM_GLOBAL_FLAG_COARSE_GRAINED + }; + hsa_status_t status = hsa_fns.hsa_amd_svm_attributes_set_fn (ptr, size, + &attr, 1); + if (status != HSA_STATUS_SUCCESS) + GOMP_PLUGIN_fatal ("Failed to allocate Unified Shared Memory;" + " please update your drivers and/or kernel"); + /* Record the allocation for GOMP_OFFLOAD_is_usm_ptr. */ usm_splay_tree_node node = malloc (sizeof (struct usm_splay_tree_node_s)); node->key.addr = ptr; node->key.size = size; @@ -3918,7 +3975,8 @@ GOMP_OFFLOAD_usm_free (int device, void *ptr) free (node); } - return GOMP_OFFLOAD_free (device, ptr); + free (ptr); + return true; } /* True if the memory was allocated via GOMP_OFFLOAD_usm_alloc. */ diff --git a/libgomp/testsuite/libgomp.c/usm-1.c b/libgomp/testsuite/libgomp.c/usm-1.c index e73f1816f9a..f7bf897b839 100644 --- a/libgomp/testsuite/libgomp.c/usm-1.c +++ b/libgomp/testsuite/libgomp.c/usm-1.c @@ -1,5 +1,6 @@ /* { dg-do run } */ /* { dg-require-effective-target omp_usm } */ +/* { dg-options "-foffload=amdgcn-amdhsa=-mxnack=on" { target offload_target_amdgcn } } */ #include #include diff --git a/libgomp/testsuite/libgomp.c/usm-2.c b/libgomp/testsuite/libgomp.c/usm-2.c index 31f2bae7145..3f52adbd7e1 100644 --- a/libgomp/testsuite/libgomp.c/usm-2.c +++ b/libgomp/testsuite/libgomp.c/usm-2.c @@ -1,5 +1,6 @@ /* { dg-do run } */ /* { dg-require-effective-target omp_usm } */ +/* { dg-options "-foffload=amdgcn-amdhsa=-mxnack=on" { target offload_target_amdgcn } } */ #include #include diff --git a/libgomp/testsuite/libgomp.c/usm-3.c b/libgomp/testsuite/libgomp.c/usm-3.c index 2c78a0d8ced..225cba5fe58 100644 --- a/libgomp/testsuite/libgomp.c/usm-3.c +++ b/libgomp/testsuite/libgomp.c/usm-3.c @@ -1,5 +1,6 @@ /* { dg-do run } */ /* { dg-require-effective-target omp_usm } */ +/* { dg-options "-foffload=amdgcn-amdhsa=-mxnack=on" { target offload_target_amdgcn } } */ #include #include diff --git a/libgomp/testsuite/libgomp.c/usm-4.c b/libgomp/testsuite/libgomp.c/usm-4.c index 1ac5498f73f..d4addfc587a 100644 --- a/libgomp/testsuite/libgomp.c/usm-4.c +++ b/libgomp/testsuite/libgomp.c/usm-4.c @@ -1,5 +1,6 @@ /* { dg-do run } */ /* { dg-require-effective-target omp_usm } */ +/* { dg-options "-foffload=amdgcn-amdhsa=-mxnack=on" { target offload_target_amdgcn } } */ #include #include