From patchwork Mon Aug 15 11:06:18 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chung-Lin Tang X-Patchwork-Id: 1666416 Return-Path: X-Original-To: patchwork-incoming@bilbo.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from legolas.ozlabs.org (legolas.ozlabs.org [IPv6:2404:9400:2:0:216:3eff:fee2:8c49]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4M5s1W6nzXz9s07 for ; Mon, 15 Aug 2022 21:07:11 +1000 (AEST) Received: by legolas.ozlabs.org (Postfix) id 4M5s1W17f9z1ygN; Mon, 15 Aug 2022 21:07:11 +1000 (AEST) Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4M5s1V55Wqz1yfd for ; Mon, 15 Aug 2022 21:07:10 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id C2B14385740A for ; Mon, 15 Aug 2022 11:07:08 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa3.mentor.iphmx.com (esa3.mentor.iphmx.com [68.232.137.180]) by sourceware.org (Postfix) with ESMTPS id 417F83858012; Mon, 15 Aug 2022 11:06:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 417F83858012 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.93,238,1654588800"; d="diff'?scan'208";a="81179594" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa3.mentor.iphmx.com with ESMTP; 15 Aug 2022 03:06:26 -0800 IronPort-SDR: 0nRELq7TWQAVFovqM13XdIn2VHZy8P/A/PpahawEj52qY4XbTy1M3oO8JVn/zc1tpWIGuggqt9 Q5xLm7WTerPeLwlEqOKL0qfDi2tr9oVRx7WZsKwemeE02rPgcN/E3J1bG+9ZE7VgqNQPs6C5Xz 0chU6/DjYkd4D+dnCDfU9joLIkpmGb0jkRJCR9/pFSkd041T6Y0gJmaKss4Xf5GM+rH1mppwlz pIpvkdASprEZJd/IkRvpoCzAG2HnD+v0qAUYzI/h09LVpoBvU7CkFk4CB4yucUKnD7yG5PpryK nZ0= Message-ID: <81de6e4c-b8e1-9bdd-f84d-18a0c0b9806f@codesourcery.com> Date: Mon, 15 Aug 2022 19:06:18 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:91.0) Gecko/20100101 Thunderbird/91.12.0 Content-Language: en-US From: Chung-Lin Tang Subject: [PATCH, OpenMP, Fortran] requires unified_shared_memory 2/2: insert USM allocators into libgfortran To: gcc-patches , Fortran List , Tobias Burnus , Andrew Stubbs , Catherine Moore , Jakub Jelinek X-ClientProxiedBy: SVR-ORW-MBX-09.mgc.mentorg.com (147.34.90.209) To svr-orw-mbx-10.mgc.mentorg.com (147.34.90.210) X-Spam-Status: No, score=-10.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, KAM_MANYTO, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" After the first libgfortran memory allocator preparation patch, this is the actual patch that organizes unified_shared_memory allocation into libgfortran. In the current OpenMP requires implementation, the requires_mask is collected through offload LTO processing, and presented to libgomp when registering offload images through GOMP_offload_register_ver() (called by the mkoffload generated constructor linked into the program binary) This means that the only reliable place to access omp_requires_mask is in GOMP_offload_register_ver, however since it is called through an ELF constructor in the *main program*, this runs later than libgfortran/runtime/main.c:init() constructor, and because some libgfortran init actions there start allocating memory, this can cause more deallocation errors later. Another issue is that CUDA appears to be registering some cleanup actions using atexit(), which forces libgomp to register gomp_target_fini() using atexit as well (to properly run before the underlying CUDA stuff disappears). This happens to us here as well. So to summarize we need to: (1) order libgfortran init actions after omp_requires_mask processing is done, and (2) order libgfortran cleanup actions before gomp_target_fini, to properly deallocate stuff without crashing. The above explanation is for why there's a little new set of definitions, as well as callback registering functions exported from libgomp to libgfortran, basically to register libgfortran init/fini actions into libgomp to run. Inside GOMP_offload_register_ver, after omp_requires_mask processing is done, we call into libgfortran through a new _gfortran_mem_allocators_init function to insert the omp_free/alloc/etc. based allocators into the Fortran runtime, when GOMP_REQUIRES_UNIFIED_SHARED_MEMORY is set. All symbol references between libgfortran/libgomp are defined with weak symbols. Test of the weak symbols are also used to determine if the other library exists in this program. A final issue is: the case where we have an OpenMP program that does NOT have offloading. We cannot passively determine in libgomp/libgfortran whether offloading exists or not, only the main program itself can, by seeing if the hidden __OFFLOAD_TABLE__ exists. When we do init/fini libgomp callback registering for OpenMP programs, those with no offloading will not have those callback properly run (because of no offload image loading) Therefore the solution here is a constructor added into the crtoffloadend.o fragment that does a "null" call of GOMP_offload_register_ver, solely for triggering the post-offload_register callbacks when __OFFLOAD_TABLE__ is NULL. (and because of this, the crtoffloadend.o Makefile rule is adjusted to compile with PIC) I know this is a big pile of yarn wrt how the main program/libgomp/libgfortran interacts, but it's finally working. Again tested without regressions. Preparing to commit to devel/omp/gcc-12, and seeking approval for mainline when the requires patches are in. Thanks, Chung-Lin 2022-08-15 Chung-Lin Tang libgcc/ * Makefile.in (crtoffloadend$(objext)): Add $(PICFLAG) to compile rule. * offloadstuff.c (GOMP_offload_register_ver): Add declaration of weak symbol. (__OFFLOAD_TABLE__): Likewise. (init_non_offload): New function. libgfortran/ * gfortran.map (GFORTRAN_13): New namespace. (_gfortran_mem_allocators_init): New name inside GFORTRAN_13. * libgfortran.h (mem_allocators_init): New exported declaration. * runtime/main.c (do_init): Rename from init, add run-once guard code. (cleanup): Add run-once guard code. (GOMP_post_offload_register_callback): Declare weak symbol. (GOMP_pre_gomp_target_fini_callback): Likewise. (init): New constructor to register offload callbacks, or call do_init when not OpenMP. * runtime/memory.c (gfortran_malloc): New pointer variable. (gfortran_calloc): Likewise. (gfortran_realloc): Likewise. (gfortran_free): Likewise. (mem_allocators_init): New function. (xmalloc): Use gfortran_malloc. (xmallocarray): Use gfortran_malloc. (xcalloc): Use gfortran_calloc. (xrealloc): Use gfortran_realloc. (xfree): Use gfortran_free. libgomp/ * libgomp.map (GOMP_5.1.2): New version namespace. (GOMP_post_offload_register_callback): New name inside GOMP_5.1.2. (GOMP_pre_gomp_target_fini_callback): Likewise. (GOMP_DEFINE_CALLBACK_SET): Macro to define callback set. (post_offload_register): Define callback set for after offload image register. (pre_gomp_target_fini): Define callback set for before gomp_target_fini is called. (libgfortran_malloc_usm): New function. (libgfortran_calloc_usm): Likewise (libgfortran_realloc_usm): Likewise (libgfortran_free_usm): Likewise. (_gfortran_mem_allocators_init): Declare weak symbol. (gomp_libgfortran_omp_allocators_init): New function. (GOMP_offload_register_ver): Add handling of host_table == NULL, calling into libgfortran to set unified_shared_memory allocators, and execution of post_offload_register callbacks. (gomp_target_init): Register all pre_gomp_target_fini callbacks to run at end of main using atexit(). diff --git a/libgcc/Makefile.in b/libgcc/Makefile.in index 09b3ec8..70720cc 100644 --- a/libgcc/Makefile.in +++ b/libgcc/Makefile.in @@ -1045,8 +1045,9 @@ crtbeginT$(objext): $(srcdir)/crtstuff.c crtoffloadbegin$(objext): $(srcdir)/offloadstuff.c $(crt_compile) $(CRTSTUFF_T_CFLAGS) -c $< -DCRT_BEGIN +# crtoffloadend contains a constructor with calls to libgomp, so build as PIC. crtoffloadend$(objext): $(srcdir)/offloadstuff.c - $(crt_compile) $(CRTSTUFF_T_CFLAGS) -c $< -DCRT_END + $(crt_compile) $(CRTSTUFF_T_CFLAGS) $(PICFLAG) -c $< -DCRT_END crtoffloadtable$(objext): $(srcdir)/offloadstuff.c $(crt_compile) $(CRTSTUFF_T_CFLAGS) -c $< -DCRT_TABLE diff --git a/libgcc/offloadstuff.c b/libgcc/offloadstuff.c index 10e1fe1..2edb681 100644 --- a/libgcc/offloadstuff.c +++ b/libgcc/offloadstuff.c @@ -63,6 +63,19 @@ const void *const __offload_vars_end[0] __attribute__ ((__used__, visibility ("hidden"), section (OFFLOAD_VAR_TABLE_SECTION_NAME))) = { }; +extern void GOMP_offload_register_ver (unsigned, const void *, int, + const void *); +extern const void *const __OFFLOAD_TABLE__[0] __attribute__ ((weak)); +static void __attribute__((constructor)) +init_non_offload (void) +{ + /* If an OpenMP program has no offloading, post-offload_register callbacks + that need to run will require a call to GOMP_offload_register_ver, in + order to properly trigger those callbacks during init. */ + if (__OFFLOAD_TABLE__ == NULL) + GOMP_offload_register_ver (0, NULL, 0, NULL); +} + #elif defined CRT_TABLE extern const void *const __offload_func_table[]; diff --git a/libgfortran/gfortran.map b/libgfortran/gfortran.map index e0e795c..55d2a52 100644 --- a/libgfortran/gfortran.map +++ b/libgfortran/gfortran.map @@ -1759,3 +1759,8 @@ GFORTRAN_12 { _gfortran_transfer_real128_write; #endif } GFORTRAN_10.2; + +GFORTRAN_13 { + global: + _gfortran_mem_allocators_init; +} GFORTRAN_12; diff --git a/libgfortran/libgfortran.h b/libgfortran/libgfortran.h index 0b893a5..e518b39 100644 --- a/libgfortran/libgfortran.h +++ b/libgfortran/libgfortran.h @@ -874,6 +874,11 @@ internal_proto(xrealloc); extern void xfree (void *); internal_proto(xfree); +#ifndef LIBGFOR_MINIMAL +extern void mem_allocators_init (void *, void *, void *, void *); +export_proto(mem_allocators_init); +#endif + /* environ.c */ extern void init_variables (void); diff --git a/libgfortran/runtime/main.c b/libgfortran/runtime/main.c index 5162a8f..8aa688e 100644 --- a/libgfortran/runtime/main.c +++ b/libgfortran/runtime/main.c @@ -61,9 +61,16 @@ get_args (int *argc, char ***argv) /* Initialize the runtime library. */ -static void __attribute__((constructor)) -init (void) +static void +do_init (void) { +#ifndef LIBGFOR_MINIMAL + static bool do_init_ran = false; + if (do_init_ran) + return; + do_init_ran = true; +#endif + /* Must be first */ init_variables (); @@ -82,5 +89,37 @@ init (void) static void __attribute__((destructor)) cleanup (void) { +#ifndef LIBGFOR_MINIMAL + static bool cleanup_ran = false; + if (cleanup_ran) + return; + cleanup_ran = true; +#endif + close_units (); } + +#ifndef LIBGFOR_MINIMAL +extern void __attribute__((weak)) +GOMP_post_offload_register_callback (void (*func)(void)); + +extern void __attribute__((weak)) +GOMP_pre_gomp_target_fini_callback (void (*func)(void)); +#endif + +static void __attribute__((constructor)) +init (void) +{ +#ifndef LIBGFOR_MINIMAL + if (GOMP_post_offload_register_callback) + { + GOMP_post_offload_register_callback (do_init); + GOMP_pre_gomp_target_fini_callback (cleanup); + return; + } +#endif + + /* If libgomp is not present, then we can go ahead and call do_init + directly. */ + do_init (); +} diff --git a/libgfortran/runtime/memory.c b/libgfortran/runtime/memory.c index cbcec7c..8bf5b60 100644 --- a/libgfortran/runtime/memory.c +++ b/libgfortran/runtime/memory.c @@ -26,6 +26,28 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see #include "libgfortran.h" #include +#ifndef LIBGFOR_MINIMAL +static void * (*gfortran_malloc)(size_t) = malloc; +static void * (*gfortran_calloc)(size_t, size_t) = calloc; +static void * (*gfortran_realloc)(void *, size_t) = realloc; +static void (*gfortran_free)(void *) = free; + +void +mem_allocators_init (void *malloc_ptr, void *calloc_ptr, + void *realloc_ptr, void *free_ptr) +{ + gfortran_malloc = malloc_ptr; + gfortran_calloc = calloc_ptr; + gfortran_realloc = realloc_ptr; + gfortran_free = free_ptr; +} + +#else +#define gfortran_malloc malloc +#define gfortran_calloc calloc +#define gfortran_realloc realloc +#define gfortran_free free +#endif void * xmalloc (size_t n) @@ -35,7 +57,7 @@ xmalloc (size_t n) if (n == 0) n = 1; - p = malloc (n); + p = gfortran_malloc (n); if (p == NULL) os_error ("Memory allocation failed"); @@ -57,7 +79,7 @@ xmallocarray (size_t nmemb, size_t size) os_error ("Integer overflow in xmallocarray"); } - p = malloc (prod); + p = gfortran_malloc (prod); if (!p) os_error ("Memory allocation failed in xmallocarray"); @@ -73,7 +95,7 @@ xcalloc (size_t nmemb, size_t size) if (!nmemb || !size) nmemb = size = 1; - void *p = calloc (nmemb, size); + void *p = gfortran_calloc (nmemb, size); if (!p) os_error ("Allocating cleared memory failed"); @@ -86,7 +108,7 @@ xrealloc (void *ptr, size_t size) if (size == 0) size = 1; - void *newp = realloc (ptr, size); + void *newp = gfortran_realloc (ptr, size); if (!newp) os_error ("Memory allocation failure in xrealloc"); @@ -96,5 +118,5 @@ xrealloc (void *ptr, size_t size) void xfree (void *ptr) { - free (ptr); + gfortran_free (ptr); } diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map index 5af5c2d..c3af75c 100644 --- a/libgomp/libgomp.map +++ b/libgomp/libgomp.map @@ -624,3 +624,9 @@ GOMP_PLUGIN_1.3 { GOMP_PLUGIN_goacc_profiling_dispatch; GOMP_PLUGIN_goacc_thread; } GOMP_PLUGIN_1.2; + +GOMP_5.1.2 { + global: + GOMP_post_offload_register_callback; + GOMP_pre_gomp_target_fini_callback; +} GOMP_5.1.1; diff --git a/libgomp/target.c b/libgomp/target.c index 997b2aa..6a5c0bb 100644 --- a/libgomp/target.c +++ b/libgomp/target.c @@ -2522,6 +2522,70 @@ gomp_requires_to_name (char *buf, size_t size, int requires_mask) (p == buf ? "" : ", ")); } +/* Macro to define a callback set with a name, and routine to register + a callback function into set. */ +#define GOMP_DEFINE_CALLBACK_SET(name) \ + static unsigned int num_ ## name ## _callbacks = 0; \ + static void (*name ## _callbacks[4])(void); \ + void GOMP_ ## name ## _callback (void (*fn)(void)) \ + { \ + if (num_ ## name ## _callbacks \ + < (sizeof (name ## _callbacks) \ + / sizeof (name ## _callbacks[0]))) \ + { \ + name ## _callbacks[num_ ## name ## _callbacks] = fn; \ + num_ ## name ## _callbacks += 1; \ + } \ + } +GOMP_DEFINE_CALLBACK_SET(post_offload_register) +GOMP_DEFINE_CALLBACK_SET(pre_gomp_target_fini) +#undef GOMP_DEFINE_CALLBACK_SET + +/* Routines to insert into libgfortran, under unified_shared_memory. */ +static void * +libgfortran_malloc_usm (size_t size) +{ + return omp_alloc (size, ompx_unified_shared_mem_alloc); +} + +static void * +libgfortran_calloc_usm (size_t n, size_t size) +{ + return omp_calloc (n, size, ompx_unified_shared_mem_alloc); +} + +static void * +libgfortran_realloc_usm (void *ptr, size_t size) +{ + return omp_realloc (ptr, size, ompx_unified_shared_mem_alloc, + ompx_unified_shared_mem_alloc); +} + +static void +libgfortran_free_usm (void *ptr) +{ + omp_free (ptr, ompx_unified_shared_mem_alloc); +} + +extern void __attribute__((weak)) +_gfortran_mem_allocators_init (void *, void *, void *, void *); + +static void +gomp_libgfortran_omp_allocators_init (int omp_requires_mask) +{ + static bool init = false; + if (init) + return; + init = true; + + if ((omp_requires_mask & GOMP_REQUIRES_UNIFIED_SHARED_MEMORY) + && _gfortran_mem_allocators_init != NULL) + _gfortran_mem_allocators_init (libgfortran_malloc_usm, + libgfortran_calloc_usm, + libgfortran_realloc_usm, + libgfortran_free_usm); +} + /* This function should be called from every offload image while loading. It gets the descriptor of the host func and var tables HOST_TABLE, TYPE of the target, and DATA. */ @@ -2532,6 +2596,9 @@ GOMP_offload_register_ver (unsigned version, const void *host_table, { int i; + if (host_table == NULL) + goto end; + if (GOMP_VERSION_LIB (version) > GOMP_VERSION) gomp_fatal ("Library too old for offload (version %u < %u)", GOMP_VERSION, GOMP_VERSION_LIB (version)); @@ -2598,6 +2665,14 @@ GOMP_offload_register_ver (unsigned version, const void *host_table, num_offload_images++; gomp_mutex_unlock (®ister_lock); + + /* Call into libgfortran to initialize OpenMP memory allocators. */ + gomp_libgfortran_omp_allocators_init (omp_requires_mask); + + end: + for (int i = 0; i < num_post_offload_register_callbacks; i++) + post_offload_register_callbacks[i] (); + num_post_offload_register_callbacks = 0; } /* Legacy entry point. */ @@ -2710,7 +2785,7 @@ gomp_unload_device (struct gomp_device_descr *devicep) if (devicep->state == GOMP_DEVICE_INITIALIZED) { unsigned i; - + /* Unload from device all images registered at the moment. */ for (i = 0; i < num_offload_images; i++) { @@ -4570,6 +4645,13 @@ gomp_target_init (void) devices = devs; if (atexit (gomp_target_fini) != 0) gomp_fatal ("atexit failed"); + + /* Register 'pre_gomp_target_fini' callbacks to run before gomp_target_fini + during finalization. */ + for (int i = 0; i < num_pre_gomp_target_fini_callbacks; i++) + if (atexit (pre_gomp_target_fini_callbacks[i]) != 0) + gomp_fatal ("atexit failed"); + num_pre_gomp_target_fini_callbacks = 0; } #else /* PLUGIN_SUPPORT */