diff mbox series

GCN: The original meaning of 'GCN_SUPPRESS_HOST_FALLBACK' isn't applicable (non-shared memory system) (was: GCN: Even with 'GCN_SUPPRESS_HOST_FALLBACK' set, failure to 'init_hsa_runtime_functions' is not fatal)

Message ID 877cid6nb8.fsf@euler.schwinge.ddns.net
State New
Headers show
Series GCN: The original meaning of 'GCN_SUPPRESS_HOST_FALLBACK' isn't applicable (non-shared memory system) (was: GCN: Even with 'GCN_SUPPRESS_HOST_FALLBACK' set, failure to 'init_hsa_runtime_functions' is not fatal) | expand

Commit Message

Thomas Schwinge March 8, 2024, 10:16 a.m. UTC
Hi!

So, attached here is now a different patch
"GCN: The original meaning of 'GCN_SUPPRESS_HOST_FALLBACK' isn't applicable (non-shared memory system)",
that takes a different approach re clarifying the two orthogonal aspects
that the 'GCN_SUPPRESS_HOST_FALLBACK' environment variable controls:
(a) the *original* meaning via 'HSA_SUPPRESS_HOST_FALLBACK', and
(b) the *additional*/*new* meaning to report as fatal certain errors
during device probing.

As you requested, (b) remains as it is (with just the diagnostic message
clarified).  Re (a):

On 2024-03-07T14:37:10+0100, I wrote:
> On 2024-03-07T12:43:07+0100, Tobias Burnus <tburnus@baylibre.com> wrote:
>> Thomas Schwinge wrote:
>>> [...] libgomp GCN plugin 'GCN_SUPPRESS_HOST_FALLBACK' [...]
>>>
>>> [...] originates in the libgomp HSA plugin, where the idea was -- in my
>>> understanding -- that you wouldn't have device code available for all
>>> 'fn_ptr's, and in that case transparently (shared-memory system!) do
>>> host-fallback execution.  Or, with 'GCN_SUPPRESS_HOST_FALLBACK' set,
>>> you'd get those diagnosed.
>>>
>>> This has then been copied into the libgomp GCN plugin (see above).
>>> However, is it really still applicable there; don't we assume that we're
>>> generating device code for all relevant functions?

> And, one step back: how is (the original meaning of)
> 'suppress_host_fallback = false' even supposed to work on non-shared
> memory systems as currently implemented by the libgomp GCN plugin?

> [...] this whole concept of dynamic plugin-level
> host-fallback execution being in conflict with our current non-shared
> memory system configurations?

I therefore suggest to get rid of (a).

OK to push?


Grüße
 Thomas

Comments

Andrew Stubbs March 8, 2024, 12:42 p.m. UTC | #1
On 08/03/2024 10:16, Thomas Schwinge wrote:
> Hi!
> 
> So, attached here is now a different patch
> "GCN: The original meaning of 'GCN_SUPPRESS_HOST_FALLBACK' isn't applicable (non-shared memory system)",
> that takes a different approach re clarifying the two orthogonal aspects
> that the 'GCN_SUPPRESS_HOST_FALLBACK' environment variable controls:
> (a) the *original* meaning via 'HSA_SUPPRESS_HOST_FALLBACK', and
> (b) the *additional*/*new* meaning to report as fatal certain errors
> during device probing.
> 
> As you requested, (b) remains as it is (with just the diagnostic message
> clarified).  Re (a):
> 
> On 2024-03-07T14:37:10+0100, I wrote:
>> On 2024-03-07T12:43:07+0100, Tobias Burnus <tburnus@baylibre.com> wrote:
>>> Thomas Schwinge wrote:
>>>> [...] libgomp GCN plugin 'GCN_SUPPRESS_HOST_FALLBACK' [...]
>>>>
>>>> [...] originates in the libgomp HSA plugin, where the idea was -- in my
>>>> understanding -- that you wouldn't have device code available for all
>>>> 'fn_ptr's, and in that case transparently (shared-memory system!) do
>>>> host-fallback execution.  Or, with 'GCN_SUPPRESS_HOST_FALLBACK' set,
>>>> you'd get those diagnosed.
>>>>
>>>> This has then been copied into the libgomp GCN plugin (see above).
>>>> However, is it really still applicable there; don't we assume that we're
>>>> generating device code for all relevant functions?
> 
>> And, one step back: how is (the original meaning of)
>> 'suppress_host_fallback = false' even supposed to work on non-shared
>> memory systems as currently implemented by the libgomp GCN plugin?
> 
>> [...] this whole concept of dynamic plugin-level
>> host-fallback execution being in conflict with our current non-shared
>> memory system configurations?
> 
> I therefore suggest to get rid of (a).
> 
> OK to push?

I wasn't aware that things could be broken when fallback-suppression 
*wasn't* set. I agree that we don't need that "feature".

As far as I knew this feature was merely an older implementation of the 
now-standard OMP_TARGET_OFFLOAD=mandatory with the additional advantage 
that we could make it do whatever we want for our test and debug needs 
(i.e. no target independent "smarts").

This patch looks good, thanks.

Andrew
diff mbox series

Patch

From 2a188021ca70fc1956ba78707fdec9dcca4f734d Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <tschwinge@baylibre.com>
Date: Thu, 7 Mar 2024 15:51:54 +0100
Subject: [PATCH] GCN: The original meaning of 'GCN_SUPPRESS_HOST_FALLBACK'
 isn't applicable (non-shared memory system)

'GCN_SUPPRESS_HOST_FALLBACK' originated as 'HSA_SUPPRESS_HOST_FALLBACK' in the
libgomp HSA plugin, where the idea was -- in my understanding -- that you
wouldn't have device code available for all functions that may be called, and
in that case transparently (shared memory system!) do host-fallback execution.
Or, with 'HSA_SUPPRESS_HOST_FALLBACK' set, you'd get those diagnosed.

This has then been copied into the libgomp GCN plugin as
'GCN_SUPPRESS_HOST_FALLBACK'.  However, the original meaning isn't applicable
for the libgomp GCN plugin anymore: we assume that we're generating device code
for all relevant functions, and we're implementing a non-shared memory system,
where we cannot transparently do host-fallback execution for individual
functions.

However, 'GCN_SUPPRESS_HOST_FALLBACK' has gained an additional meaning, to
enforce a fatal error in case that 'libhsa-runtime64.so.1' can't be dynamically
loaded; keep that meaning.

	libgomp/
	* plugin/plugin-gcn.c (GOMP_OFFLOAD_can_run): Don't consider
	'GCN_SUPPRESS_HOST_FALLBACK' anymore (assume always-'true').
	(init_hsa_context): Adjust 'GCN_SUPPRESS_HOST_FALLBACK' error
	message.
---
 libgomp/plugin/plugin-gcn.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 4b7ab5e83c5..7e141a85f31 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -1524,9 +1524,11 @@  init_hsa_context (void)
   init_environment_variables ();
   if (!init_hsa_runtime_functions ())
     {
-      GCN_WARNING ("Run-time could not be dynamically opened\n");
+      const char *msg = "Run-time could not be dynamically opened";
       if (suppress_host_fallback)
-	GOMP_PLUGIN_fatal ("GCN host fallback has been suppressed");
+	GOMP_PLUGIN_fatal ("%s\n", msg);
+      else
+	GCN_WARNING ("%s\n", msg);
       return false;
     }
   status = hsa_fns.hsa_init_fn ();
@@ -3855,15 +3857,9 @@  GOMP_OFFLOAD_can_run (void *fn_ptr)
 
   init_kernel (kernel);
   if (kernel->initialization_failed)
-    goto failure;
+    GOMP_PLUGIN_fatal ("kernel initialization failed");
 
   return true;
-
-failure:
-  if (suppress_host_fallback)
-    GOMP_PLUGIN_fatal ("GCN host fallback has been suppressed");
-  GCN_WARNING ("GCN target cannot be launched, doing a host fallback\n");
-  return false;
 }
 
 /* Allocate memory on device N.  */
-- 
2.34.1