[libgomp] Enable OpenACC GCN testing

Message ID	737bd55b-e1fb-97cf-2a50-837a39cb6075@codesourcery.com
State	New
Headers	show Return-Path: <gcc-patches-return-513435-incoming=patchwork.ozlabs.org@gcc.gnu.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to :from:subject:message-id:date:mime-version:content-type; q=dns; s=default; b=K7Ripaa3a50DtGEq8bdPwWmuV3jQULr9yom3dPgXqCj3sQfPx4 as0wbdYQ71bfd43YEFET6URQ3fGzV1PwKZoy7twiYRs2bSMcYMttTJSeC/b/yx3S ZyLbVTY5Rc7jLUo6ysRivgZKC21f/5j5Vx6QhVbsq/ivMfbqmAsLZwGRQ= Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org IronPort-SDR: syuwpse8l8SDYyXvryB3tVCF5HYeUhCMqBQheSFtz6KWj37+OibcFsrOCZfXfLs3AYvGpo6mKG d8GLcgJe6HuQKIIAuOuM54fj15+ghXi2PBZsT+fCvkhoayu7XRSjCw0UeSPSyv1ydkY5kCrOye xsmMqTqbMpGN0eENEMNx0vZjrCjeSMR4Za/FVWMa8Si9vBh9nEKexPmmbYujkn59El1yaoybtV E5d93QD2cGVHdqvVErRte370uoQGPJpXkmqVzVobh+BwzP6D2FsBA2ATk5vf2zSVphfcfFsH5K ek4= IronPort-SDR: K2IPLFYyyPVDD1GWmsq8qcOB48Km6RSBHtm7FhqPHrzuQR3Sr1kFaLucNtFpjcTS9F1qxbJJKF izWHIRT8cidakVg2uKKGdbhOHhnpMEYAjM0WI+1mXYRwDYxNJrqpN89yO3UzDcj+yKGa5sFOTr +q57BAxNmpT6koz61joTTfqKtCBmYRdO0z2Riu+gjl0ZvjToJ9soNzao19j/8dpd6VJx/NHC+w ROk4BadJngNVyXAUugM91GgDhvyDXi7aJp9MzKwlX7sjiRoCLeQKaAAlGtqrhxnSzFF2X96PY6 Dyg= To: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org> From: Andrew Stubbs <ams@codesourcery.com> Subject: [patch, libgomp] Enable OpenACC GCN testing Message-ID: <737bd55b-e1fb-97cf-2a50-837a39cb6075@codesourcery.com> Date: Thu, 14 Nov 2019 16:36:38 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.1.2 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------------5F568F26B603E6C0BF4F4E02"
Series	[libgomp] Enable OpenACC GCN testing \| expand [libgomp] Enable OpenACC GCN testing

Andrew Stubbs Nov. 14, 2019, 4:36 p.m. UTC

Hi,

This patch adds some necessary bits to enable OpenACC testings for 
amdgcn offloading.

The two "check_effective" procedures are not actually needed yet, but 
later patches to test cases will use them.

OK to commit?

Thanks

Andrew

Jakub Jelinek Nov. 15, 2019, 12:21 p.m. UTC | #1

On Thu, Nov 14, 2019 at 04:36:38PM +0000, Andrew Stubbs wrote:
> This patch adds some necessary bits to enable OpenACC testings for amdgcn
> offloading.
> 
> The two "check_effective" procedures are not actually needed yet, but later
> patches to test cases will use them.
> 
> OK to commit?

I'm surprised by the set acc_mem_shared 0, I thought gcn is a shared memory
offloading target.

The patch is OpenACC only and what do I know about OpenACC? ;)
If Thomas is ok with it, the patch is good for trunk.

> 2019-11-14  Andrew Stubbs  <ams@codesourcery.com>
> 
> 	libgomp/
> 	* testsuite/lib/libgomp.exp (offload_target_to_openacc_device_type):
> 	Recognize amdgcn.
> 	(check_effective_target_openacc_amdgcn_accel_present): New proc.
> 	(check_effective_target_openacc_amdgcn_accel_selected): New proc.
> 	* testsuite/libgomp.oacc-c++/c++.exp: Set acc_mem_shared for amdgcn.
> 	* testsuite/libgomp.oacc-c/c.exp: Likewise.
> 	* testsuite/libgomp.oacc-fortran/fortran.exp: Likewise.

	Jakub

Andrew Stubbs Nov. 15, 2019, 12:38 p.m. UTC | #2

On 15/11/2019 12:21, Jakub Jelinek wrote:
> On Thu, Nov 14, 2019 at 04:36:38PM +0000, Andrew Stubbs wrote:
>> This patch adds some necessary bits to enable OpenACC testings for amdgcn
>> offloading.
>>
>> The two "check_effective" procedures are not actually needed yet, but later
>> patches to test cases will use them.
>>
>> OK to commit?
> 
> I'm surprised by the set acc_mem_shared 0, I thought gcn is a shared memory
> offloading target.

APUs, such as Carizzo are shared memory. DGPUs, such as Fiji and Vega, 
have their own memory. A DGPU can access host memory, provided that it 
has been set up just so, but that is very slow, and I don't know of a 
way to do that without still having to copy the program data into that 
special region.

> The patch is OpenACC only and what do I know about OpenACC? ;)
> If Thomas is ok with it, the patch is good for trunk.

Thanks anyway.

Andrew

Jakub Jelinek Nov. 15, 2019, 12:43 p.m. UTC | #3

On Fri, Nov 15, 2019 at 12:38:06PM +0000, Andrew Stubbs wrote:
> On 15/11/2019 12:21, Jakub Jelinek wrote:
> > On Thu, Nov 14, 2019 at 04:36:38PM +0000, Andrew Stubbs wrote:
> > > This patch adds some necessary bits to enable OpenACC testings for amdgcn
> > > offloading.
> > > 
> > > The two "check_effective" procedures are not actually needed yet, but later
> > > patches to test cases will use them.
> > > 
> > > OK to commit?
> > 
> > I'm surprised by the set acc_mem_shared 0, I thought gcn is a shared memory
> > offloading target.
> 
> APUs, such as Carizzo are shared memory. DGPUs, such as Fiji and Vega, have
> their own memory. A DGPU can access host memory, provided that it has been
> set up just so, but that is very slow, and I don't know of a way to do that
> without still having to copy the program data into that special region.

Ah, that is going to be interesting e.g. for
#pragma omp requires unified_shared_memory
My plan was that for this the TU constructor would call some libgomp
library function that in the end would when loading plugins ensure that
only shared memory supporting plugins are loaded.  If the gcn plugin
will support shared memory only conditionally, we'll need some interfaces to
ensure that.

	Jakub

Stubbs, Andrew Nov. 15, 2019, 12:59 p.m. UTC | #4

On 15/11/2019 12:43, Jakub Jelinek wrote:
>> APUs, such as Carizzo are shared memory. DGPUs, such as Fiji and Vega, have
>> their own memory. A DGPU can access host memory, provided that it has been
>> set up just so, but that is very slow, and I don't know of a way to do that
>> without still having to copy the program data into that special region.
> 
> Ah, that is going to be interesting e.g. for
> #pragma omp requires unified_shared_memory
> My plan was that for this the TU constructor would call some libgomp
> library function that in the end would when loading plugins ensure that
> only shared memory supporting plugins are loaded.  If the gcn plugin
> will support shared memory only conditionally, we'll need some interfaces to
> ensure that.

This is a problem indeed. Right now the capability has to be declared up 
front, before the device has even been initialized. In theory, there 
could be both devices in a single machine, and that would be even worse, 
but then we don't support machines with heterogeneous DGPUs well either 
(it works fine for user programs that just use the GPU they're given).

We've been basically ignoring it because a) we don't have any APUs to 
test with, and b) we're being paid to produce a toolchain for 
supercomputers that will never use APUs (nor mismatched DGPUs).

The APU can (pretend to) not use shared memory, of course, so it might 
Just Work, but I can't be sure.

There's also the problem of the XNACK support, which we haven't 
implemented in the backend. That's specific to APUs, so we'll never need 
it. You can get some distance without -- there was no mention of it in 
Honza's original port that targeted Carizzo -- but I don't really know 
how far.

Really, if nobody with an APU wants to fix it, we should just remove the 
Carizzo support that is there.

Andrew

Thomas Schwinge Dec. 2, 2019, 2:19 p.m. UTC | #5

Hi Andrew!

Sorry for the delay.

On 2019-11-14T16:36:38+0000, Andrew Stubbs <ams@codesourcery.com> wrote:
> This patch adds some necessary bits to enable OpenACC testings for 
> amdgcn offloading.

Generally, I'm in favor if you'd consider such a thing (that in principle
is just a copy/adapt of the existing cases) as obvious to commit (even
more so with your "amdgcn port" maintainer hat on), especially so given
that this has been/is blocking you, as Tobias told me more than once.

That said:

> --- a/libgomp/testsuite/lib/libgomp.exp
> +++ b/libgomp/testsuite/lib/libgomp.exp
> @@ -316,6 +316,9 @@ proc offload_target_to_openacc_device_type { offload_target } {
>  	nvptx* {
>  	    return "nvidia"
>  	}
> +	amdgcn* {
> +	    return "gcn"
> +	}
>  	default {
>  	    error "Unknown offload target: $offload_target"
>  	}

Maintain alphabetical sorting?

> @@ -444,3 +447,28 @@ proc check_effective_target_hsa_offloading_selected {} {
>  	check_effective_target_hsa_offloading_selected_nocache
>      }]
>  }
> +# Return 1 if at least one AMD GCN board is present.

Missing vertical space?

> +proc check_effective_target_openacc_amdgcn_accel_present { } {

> +proc check_effective_target_openacc_amdgcn_accel_selected { } {

I'd also have inserted these maintaining alphabetical sorting, but I see
the existing other stuff also doesn't, so no need to bother.

For all the following three:

> --- a/libgomp/testsuite/libgomp.oacc-c++/c++.exp
> +++ b/libgomp/testsuite/libgomp.oacc-c++/c++.exp
> @@ -106,6 +106,9 @@ if { $lang_test_file_found } {
>  
>  		set acc_mem_shared 0
>  	    }
> +	    gcn {
> +		set acc_mem_shared 0
> +	    }
>  	    default {
>  		error "Unknown OpenACC device type: $openacc_device_type (offload target: $offload_target)"
>  	    }

> --- a/libgomp/testsuite/libgomp.oacc-c/c.exp
> +++ b/libgomp/testsuite/libgomp.oacc-c/c.exp
> @@ -69,6 +69,9 @@ foreach offload_target [concat [split $offload_targets ","] "disable"] {
>  
>  	    set acc_mem_shared 0
>  	}
> +	gcn {
> +	    set acc_mem_shared 0
> +	}
>  	default {
>  	    error "Unknown OpenACC device type: $openacc_device_type (offload target: $offload_target)"
>  	}

> --- a/libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
> +++ b/libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
> @@ -94,6 +94,9 @@ if { $lang_test_file_found } {
>  
>  		set acc_mem_shared 0
>  	    }
> +	    gcn {
> +		set acc_mem_shared 0
> +	    }
>  	    default {
>  		error "Unknown OpenACC device type: $openacc_device_type (offload target: $offload_target)"
>  	    }

..., maintain alphabetical sorting, and please also include the
'untested' check/skip, as done in the 'nvidia' cases.


To record the review effort, please include "Reviewed-by: Thomas Schwinge
<thomas@codesourcery.com>" in the commit log, see
<https://gcc.gnu.org/wiki/Reviewed-by>.


Grüße
 Thomas

Thomas Schwinge Dec. 2, 2019, 2:23 p.m. UTC | #6

Hi!

On 2019-11-15T13:43:04+0100, Jakub Jelinek <jakub@redhat.com> wrote:
> On Fri, Nov 15, 2019 at 12:38:06PM +0000, Andrew Stubbs wrote:
>> On 15/11/2019 12:21, Jakub Jelinek wrote:
>> > I'm surprised by the set acc_mem_shared 0, I thought gcn is a shared memory
>> > offloading target.
>> 
>> APUs, such as Carizzo are shared memory. DGPUs, such as Fiji and Vega, have
>> their own memory. A DGPU can access host memory, provided that it has been
>> set up just so, but that is very slow, and I don't know of a way to do that
>> without still having to copy the program data into that special region.

For a few years already, Nvidia GPUs/drivers have been supporting what
they call Unified Memory, where the driver/kernel automatically handles
the movement of memory pages between host/device memories.  Given some
reasonable pre-fetching logic (either automatic in the driver/kernel, or
"guided" by the compiler/runtime), this reportedly achieves good
performance -- or even better performance than manually-managed memory
copying, as really only the data pages accessed (plus pre-fetched) will
be copied.

For example, see <https://dl.acm.org/citation.cfm?id=3356141> "Compiler
assisted hybrid implicit and explicit GPU memory management under unified
address space", which I've recently (SuperComputing 2019) have seen
presented, or other publications.

This is not currently implemented in GCC, but could/should be at some
point.

This (or even a mixture of manual-discrete/automatic-shared?) would then
be an execution mode of libgomp/plugin, selected at run-time?

> Ah, that is going to be interesting e.g. for
> #pragma omp requires unified_shared_memory
> My plan was that for this the TU constructor would call some libgomp
> library function that in the end would when loading plugins ensure that
> only shared memory supporting plugins are loaded.  If the gcn plugin
> will support shared memory only conditionally, we'll need some interfaces to
> ensure that.

(I have not yet completely digested the relevant OpenMP features.)

Grüße
 Thomas

Andrew Stubbs Dec. 2, 2019, 2:41 p.m. UTC | #7

On 02/12/2019 14:23, Thomas Schwinge wrote:
> Hi!
> 
> On 2019-11-15T13:43:04+0100, Jakub Jelinek <jakub@redhat.com> wrote:
>> On Fri, Nov 15, 2019 at 12:38:06PM +0000, Andrew Stubbs wrote:
>>> On 15/11/2019 12:21, Jakub Jelinek wrote:
>>>> I'm surprised by the set acc_mem_shared 0, I thought gcn is a shared memory
>>>> offloading target.
>>>
>>> APUs, such as Carizzo are shared memory. DGPUs, such as Fiji and Vega, have
>>> their own memory. A DGPU can access host memory, provided that it has been
>>> set up just so, but that is very slow, and I don't know of a way to do that
>>> without still having to copy the program data into that special region.
> 
> For a few years already, Nvidia GPUs/drivers have been supporting what
> they call Unified Memory, where the driver/kernel automatically handles
> the movement of memory pages between host/device memories.  Given some
> reasonable pre-fetching logic (either automatic in the driver/kernel, or
> "guided" by the compiler/runtime), this reportedly achieves good
> performance -- or even better performance than manually-managed memory
> copying, as really only the data pages accessed (plus pre-fetched) will
> be copied.

Yeah, this is not that.  When the AMD GPU accesses host memory it 
appears to bypass both L1 and L2 caching.  There's no copying, just 
direct, on-demand accesses.  This makes the performance really bad. We 
use it only for message passing, which is probably the original intent.

> For example, see <https://dl.acm.org/citation.cfm?id=3356141> "Compiler
> assisted hybrid implicit and explicit GPU memory management under unified
> address space", which I've recently (SuperComputing 2019) have seen
> presented, or other publications.
> 
> This is not currently implemented in GCC, but could/should be at some
> point.
> 
> This (or even a mixture of manual-discrete/automatic-shared?) would then
> be an execution mode of libgomp/plugin, selected at run-time?

All we really need from libgomp, to support AMD APUs, is to be able to 
toggle the shared memory mode dynamically, rather than having it baked 
into the capabilities at start-up. Probably we could figure out the 
capabilities at run-time already, but that would break when a system has 
both kinds of device. Anyway, this is theoretical as I have no intention 
to implement support for such devices.

Andrew

Andrew Stubbs Dec. 3, 2019, 12:56 p.m. UTC | #8

On 02/12/2019 14:19, Thomas Schwinge wrote:
> Generally, I'm in favor if you'd consider such a thing (that in principle
> is just a copy/adapt of the existing cases) as obvious to commit (even
> more so with your "amdgcn port" maintainer hat on), especially so given
> that this has been/is blocking you, as Tobias told me more than once.

I probably will do for incremental tweaks, but I left some stuff in this 
one to see if you were awake (that's my story and I'm sticking to it).

Here's what I have committed (and I just realized I forgot the new 
fangled reviewed-by tag, sorry).

Andrew

Thomas Schwinge Dec. 3, 2019, 1:31 p.m. UTC | #9

Hi!

On 2019-12-03T12:56:49+0000, Andrew Stubbs <ams@codesourcery.com> wrote:
> On 02/12/2019 14:19, Thomas Schwinge wrote:
>> Generally, I'm in favor if you'd consider such a thing (that in principle
>> is just a copy/adapt of the existing cases) as obvious to commit (even
>> more so with your "amdgcn port" maintainer hat on), especially so given
>> that this has been/is blocking you, as Tobias told me more than once.
>
> I probably will do for incremental tweaks, but I left some stuff in this 
> one to see if you were awake (that's my story and I'm sticking to it).

Haha!  ;-P

> Here's what I have committed (and I just realized I forgot the new 
> fangled reviewed-by tag, sorry).

No worries.  That's just an experiment anyway -- seems to work/have been
picked up for glibc, but not so much for GCC.

About the alphabetic sorting that I've mentioned: sometimes we do that,
sometimes we sort per 'GOMP_DEVICE_*'/'acc_device_t'/whatever ordering,
sometimes special cases go to the beginning/end, sometimes it's "in order
of appearance", or seemingly random, or any combination of all these.  So
there isn't a general guideline to follow.  However, to make reading the
code more easy/enjoyable, what I like to see, is at least some kind of
local consistency, instead of just always adding new stuff to the
end. (Where the latter might indeed be the right thing to do in certain
cases, given that 'GOMP_DEVICE_GCN' is the last one in the list.)

Grüße
 Thomas

Thomas Schwinge June 8, 2021, 9:17 a.m. UTC | #10

Hi!

On 2019-11-14T16:36:38+0000, Andrew Stubbs <ams@codesourcery.com> wrote:
> This patch adds some necessary bits to enable OpenACC testings for
> amdgcn offloading.

> --- a/libgomp/testsuite/lib/libgomp.exp
> +++ b/libgomp/testsuite/lib/libgomp.exp

> +# Return 1 if at least one AMD GCN board is present, and the AMD GCN device
> +# type is selected by default.
> +
> +proc check_effective_target_openacc_amdgcn_accel_selected { } {
> +    if { ![check_effective_target_openacc_amdgcn_accel_present] } {
> +     return 0;
> +    }
> +    global offload_target
> +    if { [string match "amdgcn*" $offload_target] } {
> +        return 1;
> +    }
> +    return 0;
> +}

Pushed "[GCN] Streamline
'libgomp/testsuite/lib/libgomp.exp:check_effective_target_openacc_radeon_accel_selected'"
to master branch in commit f9da798ba6348feaada80de04bc72cdf0c4a1f70, see
attached.


Grüße
 Thomas


-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf

[libgomp] Enable OpenACC GCN testing

Commit Message

Comments

Patch