diff mbox series

[libgomp] Enable OpenACC GCN testing

Message ID 737bd55b-e1fb-97cf-2a50-837a39cb6075@codesourcery.com
State New
Headers show
Series [libgomp] Enable OpenACC GCN testing | expand

Commit Message

Andrew Stubbs Nov. 14, 2019, 4:36 p.m. UTC
Hi,

This patch adds some necessary bits to enable OpenACC testings for 
amdgcn offloading.

The two "check_effective" procedures are not actually needed yet, but 
later patches to test cases will use them.

OK to commit?

Thanks

Andrew

Comments

Jakub Jelinek Nov. 15, 2019, 12:21 p.m. UTC | #1
On Thu, Nov 14, 2019 at 04:36:38PM +0000, Andrew Stubbs wrote:
> This patch adds some necessary bits to enable OpenACC testings for amdgcn
> offloading.
> 
> The two "check_effective" procedures are not actually needed yet, but later
> patches to test cases will use them.
> 
> OK to commit?

I'm surprised by the set acc_mem_shared 0, I thought gcn is a shared memory
offloading target.

The patch is OpenACC only and what do I know about OpenACC? ;)
If Thomas is ok with it, the patch is good for trunk.

> 2019-11-14  Andrew Stubbs  <ams@codesourcery.com>
> 
> 	libgomp/
> 	* testsuite/lib/libgomp.exp (offload_target_to_openacc_device_type):
> 	Recognize amdgcn.
> 	(check_effective_target_openacc_amdgcn_accel_present): New proc.
> 	(check_effective_target_openacc_amdgcn_accel_selected): New proc.
> 	* testsuite/libgomp.oacc-c++/c++.exp: Set acc_mem_shared for amdgcn.
> 	* testsuite/libgomp.oacc-c/c.exp: Likewise.
> 	* testsuite/libgomp.oacc-fortran/fortran.exp: Likewise.

	Jakub
Andrew Stubbs Nov. 15, 2019, 12:38 p.m. UTC | #2
On 15/11/2019 12:21, Jakub Jelinek wrote:
> On Thu, Nov 14, 2019 at 04:36:38PM +0000, Andrew Stubbs wrote:
>> This patch adds some necessary bits to enable OpenACC testings for amdgcn
>> offloading.
>>
>> The two "check_effective" procedures are not actually needed yet, but later
>> patches to test cases will use them.
>>
>> OK to commit?
> 
> I'm surprised by the set acc_mem_shared 0, I thought gcn is a shared memory
> offloading target.

APUs, such as Carizzo are shared memory. DGPUs, such as Fiji and Vega, 
have their own memory. A DGPU can access host memory, provided that it 
has been set up just so, but that is very slow, and I don't know of a 
way to do that without still having to copy the program data into that 
special region.

> The patch is OpenACC only and what do I know about OpenACC? ;)
> If Thomas is ok with it, the patch is good for trunk.

Thanks anyway.

Andrew
Jakub Jelinek Nov. 15, 2019, 12:43 p.m. UTC | #3
On Fri, Nov 15, 2019 at 12:38:06PM +0000, Andrew Stubbs wrote:
> On 15/11/2019 12:21, Jakub Jelinek wrote:
> > On Thu, Nov 14, 2019 at 04:36:38PM +0000, Andrew Stubbs wrote:
> > > This patch adds some necessary bits to enable OpenACC testings for amdgcn
> > > offloading.
> > > 
> > > The two "check_effective" procedures are not actually needed yet, but later
> > > patches to test cases will use them.
> > > 
> > > OK to commit?
> > 
> > I'm surprised by the set acc_mem_shared 0, I thought gcn is a shared memory
> > offloading target.
> 
> APUs, such as Carizzo are shared memory. DGPUs, such as Fiji and Vega, have
> their own memory. A DGPU can access host memory, provided that it has been
> set up just so, but that is very slow, and I don't know of a way to do that
> without still having to copy the program data into that special region.

Ah, that is going to be interesting e.g. for
#pragma omp requires unified_shared_memory
My plan was that for this the TU constructor would call some libgomp
library function that in the end would when loading plugins ensure that
only shared memory supporting plugins are loaded.  If the gcn plugin
will support shared memory only conditionally, we'll need some interfaces to
ensure that.

	Jakub
Stubbs, Andrew Nov. 15, 2019, 12:59 p.m. UTC | #4
On 15/11/2019 12:43, Jakub Jelinek wrote:
>> APUs, such as Carizzo are shared memory. DGPUs, such as Fiji and Vega, have
>> their own memory. A DGPU can access host memory, provided that it has been
>> set up just so, but that is very slow, and I don't know of a way to do that
>> without still having to copy the program data into that special region.
> 
> Ah, that is going to be interesting e.g. for
> #pragma omp requires unified_shared_memory
> My plan was that for this the TU constructor would call some libgomp
> library function that in the end would when loading plugins ensure that
> only shared memory supporting plugins are loaded.  If the gcn plugin
> will support shared memory only conditionally, we'll need some interfaces to
> ensure that.

This is a problem indeed. Right now the capability has to be declared up 
front, before the device has even been initialized. In theory, there 
could be both devices in a single machine, and that would be even worse, 
but then we don't support machines with heterogeneous DGPUs well either 
(it works fine for user programs that just use the GPU they're given).

We've been basically ignoring it because a) we don't have any APUs to 
test with, and b) we're being paid to produce a toolchain for 
supercomputers that will never use APUs (nor mismatched DGPUs).

The APU can (pretend to) not use shared memory, of course, so it might 
Just Work, but I can't be sure.

There's also the problem of the XNACK support, which we haven't 
implemented in the backend. That's specific to APUs, so we'll never need 
it. You can get some distance without -- there was no mention of it in 
Honza's original port that targeted Carizzo -- but I don't really know 
how far.

Really, if nobody with an APU wants to fix it, we should just remove the 
Carizzo support that is there.

Andrew
Thomas Schwinge Dec. 2, 2019, 2:19 p.m. UTC | #5
Hi Andrew!

Sorry for the delay.

On 2019-11-14T16:36:38+0000, Andrew Stubbs <ams@codesourcery.com> wrote:
> This patch adds some necessary bits to enable OpenACC testings for 
> amdgcn offloading.

Generally, I'm in favor if you'd consider such a thing (that in principle
is just a copy/adapt of the existing cases) as obvious to commit (even
more so with your "amdgcn port" maintainer hat on), especially so given
that this has been/is blocking you, as Tobias told me more than once.

That said:

> --- a/libgomp/testsuite/lib/libgomp.exp
> +++ b/libgomp/testsuite/lib/libgomp.exp
> @@ -316,6 +316,9 @@ proc offload_target_to_openacc_device_type { offload_target } {
>  	nvptx* {
>  	    return "nvidia"
>  	}
> +	amdgcn* {
> +	    return "gcn"
> +	}
>  	default {
>  	    error "Unknown offload target: $offload_target"
>  	}

Maintain alphabetical sorting?

> @@ -444,3 +447,28 @@ proc check_effective_target_hsa_offloading_selected {} {
>  	check_effective_target_hsa_offloading_selected_nocache
>      }]
>  }
> +# Return 1 if at least one AMD GCN board is present.

Missing vertical space?

> +proc check_effective_target_openacc_amdgcn_accel_present { } {

> +proc check_effective_target_openacc_amdgcn_accel_selected { } {

I'd also have inserted these maintaining alphabetical sorting, but I see
the existing other stuff also doesn't, so no need to bother.

For all the following three:

> --- a/libgomp/testsuite/libgomp.oacc-c++/c++.exp
> +++ b/libgomp/testsuite/libgomp.oacc-c++/c++.exp
> @@ -106,6 +106,9 @@ if { $lang_test_file_found } {
>  
>  		set acc_mem_shared 0
>  	    }
> +	    gcn {
> +		set acc_mem_shared 0
> +	    }
>  	    default {
>  		error "Unknown OpenACC device type: $openacc_device_type (offload target: $offload_target)"
>  	    }

> --- a/libgomp/testsuite/libgomp.oacc-c/c.exp
> +++ b/libgomp/testsuite/libgomp.oacc-c/c.exp
> @@ -69,6 +69,9 @@ foreach offload_target [concat [split $offload_targets ","] "disable"] {
>  
>  	    set acc_mem_shared 0
>  	}
> +	gcn {
> +	    set acc_mem_shared 0
> +	}
>  	default {
>  	    error "Unknown OpenACC device type: $openacc_device_type (offload target: $offload_target)"
>  	}

> --- a/libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
> +++ b/libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
> @@ -94,6 +94,9 @@ if { $lang_test_file_found } {
>  
>  		set acc_mem_shared 0
>  	    }
> +	    gcn {
> +		set acc_mem_shared 0
> +	    }
>  	    default {
>  		error "Unknown OpenACC device type: $openacc_device_type (offload target: $offload_target)"
>  	    }

..., maintain alphabetical sorting, and please also include the
'untested' check/skip, as done in the 'nvidia' cases.


To record the review effort, please include "Reviewed-by: Thomas Schwinge
<thomas@codesourcery.com>" in the commit log, see
<https://gcc.gnu.org/wiki/Reviewed-by>.


Grüße
 Thomas
Thomas Schwinge Dec. 2, 2019, 2:23 p.m. UTC | #6
Hi!

On 2019-11-15T13:43:04+0100, Jakub Jelinek <jakub@redhat.com> wrote:
> On Fri, Nov 15, 2019 at 12:38:06PM +0000, Andrew Stubbs wrote:
>> On 15/11/2019 12:21, Jakub Jelinek wrote:
>> > I'm surprised by the set acc_mem_shared 0, I thought gcn is a shared memory
>> > offloading target.
>> 
>> APUs, such as Carizzo are shared memory. DGPUs, such as Fiji and Vega, have
>> their own memory. A DGPU can access host memory, provided that it has been
>> set up just so, but that is very slow, and I don't know of a way to do that
>> without still having to copy the program data into that special region.

For a few years already, Nvidia GPUs/drivers have been supporting what
they call Unified Memory, where the driver/kernel automatically handles
the movement of memory pages between host/device memories.  Given some
reasonable pre-fetching logic (either automatic in the driver/kernel, or
"guided" by the compiler/runtime), this reportedly achieves good
performance -- or even better performance than manually-managed memory
copying, as really only the data pages accessed (plus pre-fetched) will
be copied.

For example, see <https://dl.acm.org/citation.cfm?id=3356141> "Compiler
assisted hybrid implicit and explicit GPU memory management under unified
address space", which I've recently (SuperComputing 2019) have seen
presented, or other publications.

This is not currently implemented in GCC, but could/should be at some
point.

This (or even a mixture of manual-discrete/automatic-shared?) would then
be an execution mode of libgomp/plugin, selected at run-time?

> Ah, that is going to be interesting e.g. for
> #pragma omp requires unified_shared_memory
> My plan was that for this the TU constructor would call some libgomp
> library function that in the end would when loading plugins ensure that
> only shared memory supporting plugins are loaded.  If the gcn plugin
> will support shared memory only conditionally, we'll need some interfaces to
> ensure that.

(I have not yet completely digested the relevant OpenMP features.)


Grüße
 Thomas
Andrew Stubbs Dec. 2, 2019, 2:41 p.m. UTC | #7
On 02/12/2019 14:23, Thomas Schwinge wrote:
> Hi!
> 
> On 2019-11-15T13:43:04+0100, Jakub Jelinek <jakub@redhat.com> wrote:
>> On Fri, Nov 15, 2019 at 12:38:06PM +0000, Andrew Stubbs wrote:
>>> On 15/11/2019 12:21, Jakub Jelinek wrote:
>>>> I'm surprised by the set acc_mem_shared 0, I thought gcn is a shared memory
>>>> offloading target.
>>>
>>> APUs, such as Carizzo are shared memory. DGPUs, such as Fiji and Vega, have
>>> their own memory. A DGPU can access host memory, provided that it has been
>>> set up just so, but that is very slow, and I don't know of a way to do that
>>> without still having to copy the program data into that special region.
> 
> For a few years already, Nvidia GPUs/drivers have been supporting what
> they call Unified Memory, where the driver/kernel automatically handles
> the movement of memory pages between host/device memories.  Given some
> reasonable pre-fetching logic (either automatic in the driver/kernel, or
> "guided" by the compiler/runtime), this reportedly achieves good
> performance -- or even better performance than manually-managed memory
> copying, as really only the data pages accessed (plus pre-fetched) will
> be copied.

Yeah, this is not that.  When the AMD GPU accesses host memory it 
appears to bypass both L1 and L2 caching.  There's no copying, just 
direct, on-demand accesses.  This makes the performance really bad. We 
use it only for message passing, which is probably the original intent.

> For example, see <https://dl.acm.org/citation.cfm?id=3356141> "Compiler
> assisted hybrid implicit and explicit GPU memory management under unified
> address space", which I've recently (SuperComputing 2019) have seen
> presented, or other publications.
> 
> This is not currently implemented in GCC, but could/should be at some
> point.
> 
> This (or even a mixture of manual-discrete/automatic-shared?) would then
> be an execution mode of libgomp/plugin, selected at run-time?

All we really need from libgomp, to support AMD APUs, is to be able to 
toggle the shared memory mode dynamically, rather than having it baked 
into the capabilities at start-up. Probably we could figure out the 
capabilities at run-time already, but that would break when a system has 
both kinds of device. Anyway, this is theoretical as I have no intention 
to implement support for such devices.

Andrew
Andrew Stubbs Dec. 3, 2019, 12:56 p.m. UTC | #8
On 02/12/2019 14:19, Thomas Schwinge wrote:
> Generally, I'm in favor if you'd consider such a thing (that in principle
> is just a copy/adapt of the existing cases) as obvious to commit (even
> more so with your "amdgcn port" maintainer hat on), especially so given
> that this has been/is blocking you, as Tobias told me more than once.

I probably will do for incremental tweaks, but I left some stuff in this 
one to see if you were awake (that's my story and I'm sticking to it).

Here's what I have committed (and I just realized I forgot the new 
fangled reviewed-by tag, sorry).

Andrew
Thomas Schwinge Dec. 3, 2019, 1:31 p.m. UTC | #9
Hi!

On 2019-12-03T12:56:49+0000, Andrew Stubbs <ams@codesourcery.com> wrote:
> On 02/12/2019 14:19, Thomas Schwinge wrote:
>> Generally, I'm in favor if you'd consider such a thing (that in principle
>> is just a copy/adapt of the existing cases) as obvious to commit (even
>> more so with your "amdgcn port" maintainer hat on), especially so given
>> that this has been/is blocking you, as Tobias told me more than once.
>
> I probably will do for incremental tweaks, but I left some stuff in this 
> one to see if you were awake (that's my story and I'm sticking to it).

Haha!  ;-P


> Here's what I have committed (and I just realized I forgot the new 
> fangled reviewed-by tag, sorry).

No worries.  That's just an experiment anyway -- seems to work/have been
picked up for glibc, but not so much for GCC.


About the alphabetic sorting that I've mentioned: sometimes we do that,
sometimes we sort per 'GOMP_DEVICE_*'/'acc_device_t'/whatever ordering,
sometimes special cases go to the beginning/end, sometimes it's "in order
of appearance", or seemingly random, or any combination of all these.  So
there isn't a general guideline to follow.  However, to make reading the
code more easy/enjoyable, what I like to see, is at least some kind of
local consistency, instead of just always adding new stuff to the
end. (Where the latter might indeed be the right thing to do in certain
cases, given that 'GOMP_DEVICE_GCN' is the last one in the list.)


Grüße
 Thomas
Thomas Schwinge June 8, 2021, 9:17 a.m. UTC | #10
Hi!

On 2019-11-14T16:36:38+0000, Andrew Stubbs <ams@codesourcery.com> wrote:
> This patch adds some necessary bits to enable OpenACC testings for
> amdgcn offloading.

> --- a/libgomp/testsuite/lib/libgomp.exp
> +++ b/libgomp/testsuite/lib/libgomp.exp

> +# Return 1 if at least one AMD GCN board is present, and the AMD GCN device
> +# type is selected by default.
> +
> +proc check_effective_target_openacc_amdgcn_accel_selected { } {
> +    if { ![check_effective_target_openacc_amdgcn_accel_present] } {
> +     return 0;
> +    }
> +    global offload_target
> +    if { [string match "amdgcn*" $offload_target] } {
> +        return 1;
> +    }
> +    return 0;
> +}

Pushed "[GCN] Streamline
'libgomp/testsuite/lib/libgomp.exp:check_effective_target_openacc_radeon_accel_selected'"
to master branch in commit f9da798ba6348feaada80de04bc72cdf0c4a1f70, see
attached.


Grüße
 Thomas


-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf
diff mbox series

Patch

Enable OpenACC GCN testing.

2019-11-14  Andrew Stubbs  <ams@codesourcery.com>

	libgomp/
	* testsuite/lib/libgomp.exp (offload_target_to_openacc_device_type):
	Recognize amdgcn.
	(check_effective_target_openacc_amdgcn_accel_present): New proc.
	(check_effective_target_openacc_amdgcn_accel_selected): New proc.
	* testsuite/libgomp.oacc-c++/c++.exp: Set acc_mem_shared for amdgcn.
	* testsuite/libgomp.oacc-c/c.exp: Likewise.
	* testsuite/libgomp.oacc-fortran/fortran.exp: Likewise.

diff --git a/libgomp/testsuite/lib/libgomp.exp b/libgomp/testsuite/lib/libgomp.exp
index 14d9b5f1305..802c6ea275f 100644
--- a/libgomp/testsuite/lib/libgomp.exp
+++ b/libgomp/testsuite/lib/libgomp.exp
@@ -316,6 +316,9 @@  proc offload_target_to_openacc_device_type { offload_target } {
 	nvptx* {
 	    return "nvidia"
 	}
+	amdgcn* {
+	    return "gcn"
+	}
 	default {
 	    error "Unknown offload target: $offload_target"
 	}
@@ -444,3 +447,28 @@  proc check_effective_target_hsa_offloading_selected {} {
 	check_effective_target_hsa_offloading_selected_nocache
     }]
 }
+# Return 1 if at least one AMD GCN board is present.
+
+proc check_effective_target_openacc_amdgcn_accel_present { } {
+    return [check_runtime openacc_amdgcn_accel_present {
+	#include <openacc.h>
+	int main () {
+	return !(acc_get_num_devices (acc_device_gcn) > 0);
+	}
+    } "" ]
+}
+
+# Return 1 if at least one AMD GCN board is present, and the AMD GCN device
+# type is selected by default.
+
+proc check_effective_target_openacc_amdgcn_accel_selected { } {
+    if { ![check_effective_target_openacc_amdgcn_accel_present] } {
+	return 0;
+    }
+    global offload_target
+    if { [string match "amdgcn*" $offload_target] } {
+        return 1;
+    }
+    return 0;
+}
+
diff --git a/libgomp/testsuite/libgomp.oacc-c++/c++.exp b/libgomp/testsuite/libgomp.oacc-c++/c++.exp
index dcefa792ca4..b2502a5f94c 100644
--- a/libgomp/testsuite/libgomp.oacc-c++/c++.exp
+++ b/libgomp/testsuite/libgomp.oacc-c++/c++.exp
@@ -106,6 +106,9 @@  if { $lang_test_file_found } {
 
 		set acc_mem_shared 0
 	    }
+	    gcn {
+		set acc_mem_shared 0
+	    }
 	    default {
 		error "Unknown OpenACC device type: $openacc_device_type (offload target: $offload_target)"
 	    }
diff --git a/libgomp/testsuite/libgomp.oacc-c/c.exp b/libgomp/testsuite/libgomp.oacc-c/c.exp
index 55cd40f1e99..420237eafe9 100644
--- a/libgomp/testsuite/libgomp.oacc-c/c.exp
+++ b/libgomp/testsuite/libgomp.oacc-c/c.exp
@@ -69,6 +69,9 @@  foreach offload_target [concat [split $offload_targets ","] "disable"] {
 
 	    set acc_mem_shared 0
 	}
+	gcn {
+	    set acc_mem_shared 0
+	}
 	default {
 	    error "Unknown OpenACC device type: $openacc_device_type (offload target: $offload_target)"
 	}
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/fortran.exp b/libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
index 852f372b319..1a731dd4c24 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
+++ b/libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
@@ -94,6 +94,9 @@  if { $lang_test_file_found } {
 
 		set acc_mem_shared 0
 	    }
+	    gcn {
+		set acc_mem_shared 0
+	    }
 	    default {
 		error "Unknown OpenACC device type: $openacc_device_type (offload target: $offload_target)"
 	    }