diff mbox series

Fix up 'libgomp.oacc-fortran/print-1.f90' GCN offloading compilation [PR104717] (was: [PATCH] fortran: Fix up gfc_trans_oacc_construct [PR104717])

Message ID 87levrsqno.fsf@dem-tschwing-1.ger.mentorg.com
State New
Headers show
Series Fix up 'libgomp.oacc-fortran/print-1.f90' GCN offloading compilation [PR104717] (was: [PATCH] fortran: Fix up gfc_trans_oacc_construct [PR104717]) | expand

Commit Message

Thomas Schwinge April 26, 2022, 5:25 p.m. UTC
Hi!

On 2022-04-25T23:19:26+0200, I wrote:
> On 2022-04-20T19:06:17+0200, Jakub Jelinek <jakub@redhat.com> wrote:
>> So that move_sese_region_to_fn works properly, OpenMP/OpenACC constructs
>> for which that function is invoked need an extra artificial BIND_EXPR
>> around their body so that we move all variables of the bodies.
>>
>> The C/C++ FEs do that both for OpenMP constructs like OMP_PARALLEL, OMP_TASK
>> or OMP_TARGET and for OpenACC constructs that behave similarly to
>> OMP_TARGET, but the Fortran FE only does that for OpenMP constructs.
>>
>> The following patch does that for OpenACC constructs too.
>> This fixes ICE on the attached testcase.
>
> ACK, thanks.

>> Unfortunately, it also regresses
>> FAIL: gfortran.dg/goacc/privatization-1-compute-loop.f90   -O  (test for excess errors)
>> FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O0  (test for excess errors)
>> FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O1  (test for excess errors)
>> FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O2  (test for excess errors)
>> FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess errors)
>> FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O3 -g  (test for excess errors)
>> FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -Os  (test for excess errors)
>> Those emits emit tons of various messages and now there are some extra ones,
>
> I've fixed these up.

One more issue became apparent, where the code changes pushed actually do
lead to a GCN offloading compilation failure:

    [...]/libgomp.oacc-fortran/print-1.f90: In function ‘MAIN__._omp_fn.0’:
    [...]/libgomp.oacc-fortran/print-1.f90:13:14: error: 512 bytes of gang-private data-share memory exhausted (increase with ‘-mgang-private-size=560’, for example)
       13 | !$acc parallel
          |              ^

In my configuration, I may indeed fix GCN offloading compilation with
'-foffload-options=amdgcn-amdhsa=-mgang-private-size=560', but I don't
think that's generally correct/sufficient, so in the the attached
"Fix up 'libgomp.oacc-fortran/print-1.f90' GCN offloading compilation
[PR104717]", I instead "raise '-mgang-private-size' to an arbitrary high
value".  This avoids having to route the actual 'sizeof' from GCC build
down to the test suite harness (which ought to be doable, but
non-trivial).  OK to push that:

    +! For GCN offloading compilation, when gang-privatizing 'dt_parm.N'
    +! (see below), we run into an 'gang-private data-share memory exhausted'
    +! error: the default '-mgang-private-size' is too small.  Per
    +! 'gcc/fortran/trans-io.cc'/'libgfortran/io/io.h', that one is
    +! 'struct st_parameter_dt', which indeed is rather big.  Instead of
    +! working out its exact size (which may vary per GCC configuration),
    +! raise '-mgang-private-size' to an arbitrary high value.
    +! { dg-additional-options "-foffload-options=amdgcn-amdhsa=-mgang-private-size=13579" { target openacc_radeon_accel_selected } }

... to master branch? (This doubles the use/testing of the
'-mgang-private-size' option!)  ;-)

We've currently not been doing OpenACC privatization scanning in
'libgomp.oacc-fortran/print-1.f90', which I've now added, to help
document the issue; no need to review that.

Of course, the issue could alternatively be fixed by adding more logic to
the GCN back end to auto-scale the allocation, or be fixed by adding more
logic to the compiler to avoid gang-privatizing varibales such as
'dt_parm.N' in such cases, but that's not something I'm going to look
into at this point.

Or, of course, be avoided by re-writing the test case to not require
gang-privatizing 'dt_parm.N', but the test case is correct as it is.


Grüße
 Thomas


>       PR fortran/104717
>       gcc/fortran/
>       * trans-openmp.cc (gfc_trans_oacc_construct): Wrap construct body
>       in an extra BIND_EXPR.

> --- a/gcc/fortran/trans-openmp.cc
> +++ b/gcc/fortran/trans-openmp.cc
> @@ -4444,7 +4444,9 @@ gfc_trans_oacc_construct (gfc_code *code)
>    gfc_start_block (&block);
>    oacc_clauses = gfc_trans_omp_clauses (&block, code->ext.omp_clauses,
>                                       code->loc, false, true);
> +  pushlevel ();
>    stmt = gfc_trans_omp_code (code->block->next, true);
> +  stmt = build3_v (BIND_EXPR, NULL, stmt, poplevel (1, 0));
>    stmt = build2_loc (gfc_get_location (&code->loc), construct_code,
>                    void_type_node, stmt, oacc_clauses);
>    gfc_add_expr_to_block (&block, stmt);


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

Comments

Thomas Schwinge April 26, 2022, 5:51 p.m. UTC | #1
Hi!

On 2022-04-26T19:25:31+0200, I wrote:
> On 2022-04-25T23:19:26+0200, I wrote:
>> On 2022-04-20T19:06:17+0200, Jakub Jelinek <jakub@redhat.com> wrote:
>>> So that move_sese_region_to_fn works properly, OpenMP/OpenACC constructs
>>> for which that function is invoked need an extra artificial BIND_EXPR
>>> around their body so that we move all variables of the bodies.
>>>
>>> The C/C++ FEs do that both for OpenMP constructs like OMP_PARALLEL, OMP_TASK
>>> or OMP_TARGET and for OpenACC constructs that behave similarly to
>>> OMP_TARGET, but the Fortran FE only does that for OpenMP constructs.
>>>
>>> The following patch does that for OpenACC constructs too.
>>> This fixes ICE on the attached testcase.
>>
>> ACK, thanks.
>
>>> Unfortunately, it also regresses
>>> FAIL: gfortran.dg/goacc/privatization-1-compute-loop.f90   -O  (test for excess errors)
>>> FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O0  (test for excess errors)
>>> FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O1  (test for excess errors)
>>> FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O2  (test for excess errors)
>>> FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess errors)
>>> FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O3 -g  (test for excess errors)
>>> FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -Os  (test for excess errors)
>>> Those emits emit tons of various messages and now there are some extra ones,
>>
>> I've fixed these up.
>
> One more issue became apparent, where the code changes pushed actually do
> lead to a GCN offloading compilation failure:
>
>     [...]/libgomp.oacc-fortran/print-1.f90: In function ‘MAIN__._omp_fn.0’:
>     [...]/libgomp.oacc-fortran/print-1.f90:13:14: error: 512 bytes of gang-private data-share memory exhausted (increase with ‘-mgang-private-size=560’, for example)
>        13 | !$acc parallel
>           |              ^
>
> In my configuration, I may indeed fix GCN offloading compilation with
> '-foffload-options=amdgcn-amdhsa=-mgang-private-size=560', but I don't
> think that's generally correct/sufficient, so in the the attached
> "Fix up 'libgomp.oacc-fortran/print-1.f90' GCN offloading compilation
> [PR104717]", I instead "raise '-mgang-private-size' to an arbitrary high
> value".  This avoids having to route the actual 'sizeof' from GCC build
> down to the test suite harness (which ought to be doable, but
> non-trivial).  OK to push that:
>
>     +! For GCN offloading compilation, when gang-privatizing 'dt_parm.N'
>     +! (see below), we run into an 'gang-private data-share memory exhausted'
>     +! error: the default '-mgang-private-size' is too small.  Per
>     +! 'gcc/fortran/trans-io.cc'/'libgfortran/io/io.h', that one is
>     +! 'struct st_parameter_dt', which indeed is rather big.  Instead of
>     +! working out its exact size (which may vary per GCC configuration),
>     +! raise '-mgang-private-size' to an arbitrary high value.
>     +! { dg-additional-options "-foffload-options=amdgcn-amdhsa=-mgang-private-size=13579" { target openacc_radeon_accel_selected } }
>
> ... to master branch? (This doubles the use/testing of the
> '-mgang-private-size' option!)  ;-)

Eh.  That only works with the default GCN multilib '-march=fiji', testing
on gfx803 amdfury2 system.  For all of '-march=gfx900' (amdnano2),
'-march=gfx906' (amd_ryzen3), '-march=gfx908' (amd-instinct1), I get:

    libgomp: GCN fatal error: Asynchronous queue error
    Runtime message: HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION: The agent attempted to access memory beyond the largest legal address.

..., and I still get that if lowering the allocation to the minimum,
'-foffload-options=amdgcn-amdhsa=-mgang-private-size=560'.

This is a really simple OpenACC 'parallel' construct:

    !$acc parallel
      write (0, '("The answer is ", I2)') var
    !$acc end parallel

..., which ought to launch a 1-gang x 1-worker x 1-vector GPU kernel, so
I'd assume '-mgang-private-size=560' (or '-mgang-private-size=13579' in
fact) is not a problem?

Help?


Grüße
 Thomas


> We've currently not been doing OpenACC privatization scanning in
> 'libgomp.oacc-fortran/print-1.f90', which I've now added, to help
> document the issue; no need to review that.
>
> Of course, the issue could alternatively be fixed by adding more logic to
> the GCN back end to auto-scale the allocation, or be fixed by adding more
> logic to the compiler to avoid gang-privatizing varibales such as
> 'dt_parm.N' in such cases, but that's not something I'm going to look
> into at this point.
>
> Or, of course, be avoided by re-writing the test case to not require
> gang-privatizing 'dt_parm.N', but the test case is correct as it is.
>
>
> Grüße
>  Thomas
>
>
>>      PR fortran/104717
>>      gcc/fortran/
>>      * trans-openmp.cc (gfc_trans_oacc_construct): Wrap construct body
>>      in an extra BIND_EXPR.
>
>> --- a/gcc/fortran/trans-openmp.cc
>> +++ b/gcc/fortran/trans-openmp.cc
>> @@ -4444,7 +4444,9 @@ gfc_trans_oacc_construct (gfc_code *code)
>>    gfc_start_block (&block);
>>    oacc_clauses = gfc_trans_omp_clauses (&block, code->ext.omp_clauses,
>>                                      code->loc, false, true);
>> +  pushlevel ();
>>    stmt = gfc_trans_omp_code (code->block->next, true);
>> +  stmt = build3_v (BIND_EXPR, NULL, stmt, poplevel (1, 0));
>>    stmt = build2_loc (gfc_get_location (&code->loc), construct_code,
>>                   void_type_node, stmt, oacc_clauses);
>>    gfc_add_expr_to_block (&block, stmt);
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
diff mbox series

Patch

From 3dfea06371aa9bcc84ad75a2bc821a45e131dca6 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Tue, 26 Apr 2022 18:41:23 +0200
Subject: [PATCH] Fix up 'libgomp.oacc-fortran/print-1.f90' GCN offloading
 compilation [PR104717]

That got broken by recent commit b2202431910e30d8505c94d1cb9341cac7080d10
"fortran: Fix up gfc_trans_oacc_construct [PR104717]".

	PR fortran/104717
	libgomp/
	* testsuite/libgomp.oacc-fortran/print-1.f90: Add OpenACC
	privatization scanning.  For GCN offloading compilation, raise
	'-mgang-private-size'.
---
 .../libgomp.oacc-fortran/print-1.f90          | 30 ++++++++++++++++++-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/libgomp/testsuite/libgomp.oacc-fortran/print-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/print-1.f90
index 7b7f73741fe..42a8538e1fb 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/print-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/print-1.f90
@@ -6,11 +6,39 @@ 
 ! Separate file 'print-1-nvptx.f90' for nvptx offloading.
 ! { dg-skip-if "separate file" { offload_target_nvptx } }
 
+! For GCN offloading compilation, when gang-privatizing 'dt_parm.N'
+! (see below), we run into an 'gang-private data-share memory exhausted'
+! error: the default '-mgang-private-size' is too small.  Per
+! 'gcc/fortran/trans-io.cc'/'libgfortran/io/io.h', that one is
+! 'struct st_parameter_dt', which indeed is rather big.  Instead of
+! working out its exact size (which may vary per GCC configuration),
+! raise '-mgang-private-size' to an arbitrary high value.
+! { dg-additional-options "-foffload-options=amdgcn-amdhsa=-mgang-private-size=13579" { target openacc_radeon_accel_selected } }
+
+! { dg-additional-options "-fopt-info-note-omp" }
+! { dg-additional-options "-foffload=-fopt-info-note-omp" }
+
+! { dg-additional-options "--param=openacc-privatization=noisy" }
+! { dg-additional-options "-foffload=--param=openacc-privatization=noisy" }
+! Prune a few: uninteresting, and potentially varying depending on GCC configuration (data types):
+! { dg-prune-output {note: variable 'D\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} } */
+
+! It's only with Tcl 8.5 (released in 2007) that "the variable 'varName'
+! passed to 'incr' may be unset, and in that case, it will be set to [...]",
+! so to maintain compatibility with earlier Tcl releases, we manually
+! initialize counter variables:
+! { dg-line l_dummy[variable c_compute 0] }
+! { dg-message dummy {} { target iN-VAl-Id } l_dummy } to avoid
+! "WARNING: dg-line var l_dummy defined, but not used".
+
 program main
   implicit none
   integer :: var = 42
 
-!$acc parallel
+!$acc parallel ! { dg-line l_compute[incr c_compute] }
+  ! { dg-note {variable 'dt_parm\.[0-9]+' declared in block is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_compute$c_compute }
+  !   { dg-note {variable 'dt_parm\.[0-9]+' ought to be adjusted for OpenACC privatization level: 'gang'} {} { target *-*-* } l_compute$c_compute }
+  !   { dg-note {variable 'dt_parm\.[0-9]+' adjusted for OpenACC privatization level: 'gang'} {} { target { ! openacc_host_selected } } l_compute$c_compute }
   write (0, '("The answer is ", I2)') var
 !$acc end parallel
 
-- 
2.25.1