mbox series

[0/6,og9] OpenACC worker partitioning in middle end (AMD GCN)

Message ID cover.1567644180.git.julian@codesourcery.com
Headers show
Series OpenACC worker partitioning in middle end (AMD GCN) | expand

Message

Julian Brown Sept. 5, 2019, 1:45 a.m. UTC
This patch series provides support for worker partitioning in the middle
end. The OpenACC device-lowering pass (oaccdevlow) is split into three
passes: the first assigns parallelism levels to loops, the second (new)
part rewrites basic blocks to implement a neutering/broadcasting scheme
for the OpenACC worker-partitioned execution mode, and the third part
performs the rest of the previous device-lowering pass.

Also included are patches to add support for placing gang-private
variables in special memory (e.g. LDS, "local-data share", on AMD GCN),
and to rewrite reductions targeting reference variables to use temporary
local scalar variables instead.

Further commentary is provided alongside individual patches.

Tested with offloading to AMD GCN. I will apply to the
openacc-gcc-9-branch shortly.

Thanks,

Julian

Julian Brown (6):
  [og9] Target-dependent gang-private variable decl rewriting
  [og9] OpenACC middle-end worker-partitioning support
  [og9] AMD GCN adjustments for middle-end worker partitioning
  [og9] Fix up tests for oaccdevlow pass splitting
  [og9] Reference reduction localization
  [og9] Enable worker partitioning for AMD GCN

 gcc/ChangeLog.openacc                         |   83 +
 gcc/Makefile.in                               |    1 +
 gcc/config/gcn/gcn-protos.h                   |    2 +-
 gcc/config/gcn/gcn-tree.c                     |    6 +-
 gcc/config/gcn/gcn.c                          |   15 +-
 gcc/config/gcn/gcn.opt                        |    2 +-
 gcc/doc/tm.texi                               |   14 +
 gcc/doc/tm.texi.in                            |    6 +
 gcc/gimplify.c                                |  102 +
 gcc/omp-builtins.def                          |    8 +
 gcc/omp-low.c                                 |   47 +-
 gcc/omp-offload.c                             |  290 ++-
 gcc/omp-offload.h                             |    1 +
 gcc/omp-sese.c                                | 2036 +++++++++++++++++
 gcc/omp-sese.h                                |   26 +
 gcc/passes.def                                |    2 +
 gcc/target.def                                |   19 +
 gcc/targhooks.h                               |    1 +
 gcc/testsuite/ChangeLog.openacc               |   12 +
 .../goacc/classify-kernels-unparallelized.c   |    8 +-
 .../c-c++-common/goacc/classify-kernels.c     |    8 +-
 .../c-c++-common/goacc/classify-parallel.c    |    8 +-
 .../c-c++-common/goacc/classify-routine.c     |    8 +-
 .../goacc/classify-kernels-unparallelized.f95 |    8 +-
 .../gfortran.dg/goacc/classify-kernels.f95    |    8 +-
 .../gfortran.dg/goacc/classify-parallel.f95   |    8 +-
 .../gfortran.dg/goacc/classify-routine.f95    |    8 +-
 gcc/tree-core.h                               |    4 +-
 gcc/tree-pass.h                               |    2 +
 gcc/tree.c                                    |   11 +-
 gcc/tree.h                                    |    2 +
 libgomp/ChangeLog.openacc                     |    5 +
 libgomp/plugin/plugin-gcn.c                   |    4 +-
 33 files changed, 2660 insertions(+), 105 deletions(-)
 create mode 100644 gcc/omp-sese.c
 create mode 100644 gcc/omp-sese.h