diff mbox series

[gfortran] Set omp_requires_mask for dynamic_allocators.

Message ID 20220221142440.3987700-1-abidh@codesourcery.com
State New
Headers show
Series [gfortran] Set omp_requires_mask for dynamic_allocators. | expand

Commit Message

Abid Qadeer Feb. 21, 2022, 2:24 p.m. UTC
This patch fixes an issue that although gfortran accepts
'requires dynamic_allocators', it does not set the omp_requires_mask
accordingly.

gcc/fortran/ChangeLog:

	* parse.cc (gfc_parse_file): Set OMP_REQUIRES_DYNAMIC_ALLOCATORS
	bit in omp_requires_mask.
---
 gcc/fortran/parse.cc | 3 +++
 1 file changed, 3 insertions(+)

Comments

Jakub Jelinek Feb. 21, 2022, 3:50 p.m. UTC | #1
On Mon, Feb 21, 2022 at 02:24:40PM +0000, Hafiz Abid Qadeer wrote:
> This patch fixes an issue that although gfortran accepts
> 'requires dynamic_allocators', it does not set the omp_requires_mask
> accordingly.
> 
> gcc/fortran/ChangeLog:
> 
> 	* parse.cc (gfc_parse_file): Set OMP_REQUIRES_DYNAMIC_ALLOCATORS
> 	bit in omp_requires_mask.
> ---
>  gcc/fortran/parse.cc | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/gcc/fortran/parse.cc b/gcc/fortran/parse.cc
> index db918291b10..821555bd85f 100644
> --- a/gcc/fortran/parse.cc
> +++ b/gcc/fortran/parse.cc
> @@ -6890,6 +6890,9 @@ done:
>        break;
>      }
>  
> +  if (omp_requires & OMP_REQ_DYNAMIC_ALLOCATORS)
> +    omp_requires_mask
> +	= (enum omp_requires) (omp_requires_mask | OMP_REQUIRES_DYNAMIC_ALLOCATORS);
>    /* Do the parse tree dump.  */
>    gfc_current_ns = flag_dump_fortran_original ? gfc_global_ns_list : NULL;

I see we do that for !$omp requires atomic_default_mem_order(...)
but it doesn't look correct to me.

The thing is, omp_requires_mask was added for C/C++ from the C/C++ notion of
translation units (and a question is how does that cope with C++20 modules),
with the assumption that once certain #pragma omp requires is seen, it
applies for the rest of the translation unit and there are some restrictions
that require it to appear before certain constructs in the source.
But, Fortran I think doesn't really have a concept of the translation unit,
the OpenMP term compilation unit is in Fortran program unit, so each
function/subroutine should have its own.  So, instead of what gfc_parse_file
does currently where it computes omp_requires as or of requires from each
function/subroutine (I think especially for atomic_default_mem_order that
can do really weird things, nothing requires that e.g. in different
functions those can't be different in Fortran, while in C/C++ it needs to be
the same), we need to make sure that omp_requires_mask omp-generic.cc sees
or uses is for Fortran the value from the current function/subroutine.

For the yet unimplemented requires unified_address etc., the plan was that
we'd emit the requirement e.g. into the offloading data such that we could
tell the runtime library all the requirements together from whole program or
shared library.  In that case using an or from the various
functions/subroutines is desirable, if at least one function requires
unified_address, the runtime should filter out any devices that don't
satisfy it, etc.

	Jakub
Tobias Burnus Feb. 21, 2022, 5:02 p.m. UTC | #2
Hi Jakub, hi Abid, hi all,

On 21.02.22 16:50, Jakub Jelinek via Fortran wrote:
> The thing is, omp_requires_mask was added for C/C++ from the C/C++ notion of
> translation units (and a question is how does that cope with C++20 modules),
> with the assumption that once certain #pragma omp requires is seen, it
> applies for the rest of the translation unit and there are some restrictions
> that require it to appear before certain constructs in the source.
>
> But, Fortran I think doesn't really have a concept of the translation unit,
> the OpenMP term compilation unit is in Fortran program unit, so each
> function/subroutine should have its own.

[Nit picking: "A program unit is a main program, an external subprogram,
a module, a submodule, or a block data program unit." Thus,
subroutines/functions which are contained inside a (sub)module or the
main program or another subroutine/function do not form a program
unit by themselves.]

I wonder whether there is a real problem in terms of the ME, but maybe
there is.

For atomic_default_mem_order: That's purely handle by the FEs by
setting the default – and just using it when parsing the 'atomic'
directive, if there is no explicit mem_order.

And for reverse_offload, unified_address or unified_shared_memory,
it has to be always the same in all 'compilation units' (which use
device-related OpenMP things).

I think both is handled correctly by gfortran and the C/C++ FE.
For gfortran, including pulling those requires by use-association
from a module ("unless the directive appears by referencing a module"
implies that this is intended).

The interesting question is about "requires dynamic_allocators".
However, I think one can still stuff it into a TU-wide
omp_requires_mask. While not all TU or even Fortran program units
have to set it, as soon as it appears in any 'requires', it affects
the available devices.

Thus, I do not see a problem to treat it like, e.g., unified_shared_memory,
except that there should be no error when not set in another program unit
(that uses OpenMP target stuff).

> So, instead of what gfc_parse_file
> does currently where it computes omp_requires as or of requires from each
> function/subroutine (I think especially for atomic_default_mem_order that
> can do really weird things, nothing requires that e.g. in different
> functions those can't be different in Fortran, while in C/C++ it needs to be
> the same), we need to make sure that omp_requires_mask omp-generic.cc sees
> or uses is for Fortran the value from the current function/subroutine.

Cf. above - is this really needed? And do you think there is an issue with
the current implementation?

> For the yet unimplemented requires unified_address etc., the plan was that
> we'd emit the requirement e.g. into the offloading data such that we could
> tell the runtime library all the requirements together from whole program or
> shared library.  In that case using an or from the various
> functions/subroutines is desirable, if at least one function requires
> unified_address, the runtime should filter out any devices that don't
> satisfy it, etc.

Regarding the implementation, there is a patch for it on OG11,
https://gcc.gnu.org/g:f5bfc65f9a6 which does an initial implementation.
(It prints a diagnostic instead of doing the filtering – but that could
be fixed easily. I think until and include 5.2, the spec is not that clear.
Post 5.2, the "supported devices" definition makes it clear, that the
device list should be filtered.)

Tobias

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
Jakub Jelinek Feb. 21, 2022, 5:47 p.m. UTC | #3
On Mon, Feb 21, 2022 at 06:02:06PM +0100, Tobias Burnus wrote:
> I wonder whether there is a real problem in terms of the ME, but maybe
> there is.
> 
> For atomic_default_mem_order: That's purely handle by the FEs by
> setting the default – and just using it when parsing the 'atomic'
> directive, if there is no explicit mem_order.

Well, for !$omp requires atomic_default_mem_order(whatever)
vs. !$omp atomic that is handled purely in the FE and I hope we do it right.
Where ME is involved is
!$omp requires atomic_default_mem_order(whatever) vs.
!$omp declare variant ...atomic_default_mem_order(whatever).
That is handled in omp-generic.cc and right now that is done during
gimplification of a function.
My reading of what gfc_parse_file does is that if I have:
subroutine foo
  !$omp requires atomic_default_mem_order(relaxed)
end subroutine foo
subroutine bar
  !$omp requires atomic_default_mem_order(acq_rel)
end subroutine bar
subroutine baz
  interface
    subroutine foo
    end subroutine
  end interface
  interface
    subroutine bar
    end subroutine
    !$omp declare variant (foo) &
    !$omp & match (implementation={atomic_default_mem_order(seq_cst)})
  end interface
  call bar
end subroutine baz
then it will call foo because omp_requires in one function is
OMP_MEMORY_ORDER_RELAXED aka 1 and in another one
OMP_MEMORY_ORDER_ACQ_REL aka 4, and (1 | 4) == OMP_MEMORY_ORDER_SEQ_CST
whenb no !$omp requires is in the baz program unit visible and so
it should just call bar.

And similarly with dynamic_allocators, if I have:
subroutine qux
  !$omp requires dynamic_allocators
end subroutine qux
subroutine corge
  interface
    subroutine garply
    end subroutine
    !$omp declare variant (quux) &
    !$omp & match (implementation={dynamic_allocators})
  end interface
  call garply
end subroutine corge
I think with the posted patch it would call quux which it should not.

	Jakub
Tobias Burnus Feb. 21, 2022, 6:47 p.m. UTC | #4
Hi Jakub,

On 21.02.22 18:47, Jakub Jelinek wrote:
> Where ME is involved is
> !$omp requires atomic_default_mem_order(whatever) vs.
> !$omp declare variant ...atomic_default_mem_order(whatever).

Ups, missed that case. (Also because there wasn't 'declare variant' when
implementing 'requires' in Fortran.)

Disclaimer to all of the following remarks: I do not understand context
selectors and their fineprint. Thus, my comments my be completely off:

> subroutine baz
> ...
>    interface
>      subroutine bar
>      end subroutine
>      !$omp declare variant (foo) &
>      !$omp & match (implementation={atomic_default_mem_order(seq_cst)})
>    end interface
>    call bar
> end subroutine baz

I concur that in this case, it needs to know the 'atomic_default_mem_order'
of baz. — But that seems to be not implementable using a module as

module m_foo
   !$omp requires atomic_default_mem_order(...)
contains
   subroutine foo
     ...
   end
end module m_bar
...
subroutine baz
   use m_foo, only: foo
   ...
end

seems to make the 'requires' available - such that it cannot be overridden
via a local 'require atomic_default_mem_order'. And having a 'use m_bar'
then has conflicting declarations. Similar probably with C++ modules,
unless the 'requires' does not propagate. (Does it?)

I find it odd to have only code which works when not using modules.
(Explicitly using the mem_order on 'omp atomic' still works.)


And for the other requires in context selectors, I do not really understand how they
are supposed to get used, either. If any 'unified_shared_memory' or 'dynamic_allocators'
appears (in linked-in code), it is in principle callable – the the run-time library
should then remove all devices which do not support it, possibly only keeping the
host device; for USM, it even has to be present in all compilation units. Thus, just
directly calling the dynamic_allocators/unified_shared_memory should have the same
effect at the end, shouldn't it?

Tobias

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
diff mbox series

Patch

diff --git a/gcc/fortran/parse.cc b/gcc/fortran/parse.cc
index db918291b10..821555bd85f 100644
--- a/gcc/fortran/parse.cc
+++ b/gcc/fortran/parse.cc
@@ -6890,6 +6890,9 @@  done:
       break;
     }
 
+  if (omp_requires & OMP_REQ_DYNAMIC_ALLOCATORS)
+    omp_requires_mask
+	= (enum omp_requires) (omp_requires_mask | OMP_REQUIRES_DYNAMIC_ALLOCATORS);
   /* Do the parse tree dump.  */
   gfc_current_ns = flag_dump_fortran_original ? gfc_global_ns_list : NULL;