mbox series

[OG11,committed,00/22] OpenACC "kernels" Improvements

Message ID 20211117160330.20029-1-frederik@codesourcery.com
Headers show
Series OpenACC "kernels" Improvements | expand

Message

Frederik Harwath Nov. 17, 2021, 4:03 p.m. UTC
Hi,

this patch series implements the re-work of the OpenACC "kernels"
implementation that has been announced at the GNU Tools Track of this
year's Linux Plumbers Conference; see
https://linuxplumbersconf.org/event/11/contributions/998/.  The
central step is contained in the commit titled "openacc: Use Graphite
for dependence analysis in \"kernels\" regions" whose commit message
also contains further explanations.

Best regards,
Frederik

PS: The commit series also includes a backport from master
"00b98b6cac25 Add dg-final option-based target selectors" and two
trivial unrelated commits "fa558c2a6664 Fix gimple_debug_cfg
declaration" and "35cdc94463fe Fix branch prediction dump message"



Andrew Stubbs (2):
  openacc: Add data optimization pass
  openacc: Add runtime alias checking for OpenACC kernels

Frederik Harwath (19):
  openacc: Move pass_oacc_device_lower after pass_graphite
  graphite: Extend SCoP detection dump output
  graphite: Rename isl_id_for_ssa_name
  graphite: Fix minor mistakes in comments
  Fix branch prediction dump message
  Move compute_alias_check_pairs to tree-data-ref.c
  graphite: Add runtime alias checking
  openacc: Use Graphite for dependence analysis in "kernels" regions
  openacc: Add "can_be_parallel" flag info to "graph" dumps
  openacc: Add further kernels tests
  openacc: Remove unused partitioning in "kernels" regions
  Add function for printing a single OMP_CLAUSE
  openacc: Warn about "independent" "kernels" loops with
    data-dependences
  openacc: Handle internal function calls in pass_lim
  openacc: Disable pass_pre on outlined functions analyzed by Graphite
  graphite: Tune parameters for OpenACC use
  graphite: Adjust scop loop-nest choice
  graphite: Accept loops without data references
  openacc: Adjust test expectations to new "kernels" handling

Sandra Loosemore (1):
  Fortran: delinearize multi-dimensional array accesses

 gcc/Makefile.in                               |    2 +
 gcc/cfgloop.c                                 |    1 +
 gcc/cfgloop.h                                 |    6 +
 gcc/cfgloopmanip.c                            |    1 +
 gcc/common.opt                                |    9 +
 gcc/config/nvptx/nvptx.c                      |    7 +
 gcc/doc/gimple.texi                           |    2 +
 gcc/doc/invoke.texi                           |   20 +-
 gcc/doc/passes.texi                           |    6 +-
 gcc/expr.c                                    |    1 +
 gcc/flag-types.h                              |    1 +
 gcc/fortran/lang.opt                          |    4 +
 gcc/fortran/trans-array.c                     |  321 ++++--
 gcc/gimple-loop-interchange.cc                |    2 +-
 gcc/gimple-pretty-print.c                     |    3 +
 gcc/gimple-walk.c                             |   15 +-
 gcc/gimple-walk.h                             |    6 +
 gcc/gimple.h                                  |    7 +-
 gcc/gimplify.c                                |   13 +-
 gcc/graph.c                                   |   35 +-
 gcc/graphite-dependences.c                    |  220 +++-
 gcc/graphite-isl-ast-to-gimple.c              |  271 ++++-
 gcc/graphite-oacc.c                           |  689 ++++++++++++
 gcc/graphite-oacc.h                           |   55 +
 gcc/graphite-optimize-isl.c                   |   42 +-
 gcc/graphite-poly.c                           |   41 +-
 gcc/graphite-scop-detection.c                 |  654 +++++++++--
 gcc/graphite-sese-to-poly.c                   |   90 +-
 gcc/graphite.c                                |  120 +-
 gcc/graphite.h                                |   40 +-
 gcc/internal-fn.c                             |    2 +
 gcc/internal-fn.h                             |    4 +-
 gcc/omp-data-optimize.cc                      |  951 ++++++++++++++++
 gcc/omp-expand.c                              |  110 +-
 gcc/omp-general.c                             |   23 +-
 gcc/omp-general.h                             |    1 +
 gcc/omp-low.c                                 |  321 +++++-
 gcc/omp-oacc-kernels-decompose.cc             |  145 ++-
 gcc/omp-offload.c                             | 1001 +++++++++++++----
 gcc/omp-offload.h                             |    2 +
 gcc/params.opt                                |    5 +-
 gcc/passes.c                                  |   42 +
 gcc/passes.def                                |   47 +-
 gcc/predict.c                                 |    2 +-
 gcc/sese.c                                    |   25 +-
 gcc/sese.h                                    |   19 +
 gcc/testsuite/c-c++-common/goacc/acc-icf.c    |    4 +-
 gcc/testsuite/c-c++-common/goacc/cache-3-1.c  |    2 +-
 ...classify-kernels-unparallelized-graphite.c |   41 +
 ...lassify-kernels-unparallelized-parloops.c} |   12 +-
 .../c-c++-common/goacc/classify-kernels.c     |   27 +-
 .../c-c++-common/goacc/classify-parallel.c    |    8 +-
 .../c-c++-common/goacc/classify-routine.c     |    8 +-
 .../c-c++-common/goacc/classify-serial.c      |   12 +-
 .../device-lowering-debug-optimization.c      |   29 +
 .../goacc/device-lowering-no-loops.c          |   17 +
 .../goacc/device-lowering-no-optimization.c   |   30 +
 .../c-c++-common/goacc/if-clause-2.c          |    2 +-
 .../goacc/kernels-decompose-1-parloops.c      |  125 ++
 .../c-c++-common/goacc/kernels-decompose-1.c  |   31 +-
 .../c-c++-common/goacc/kernels-decompose-2.c  |    2 +-
 .../goacc/kernels-decompose-ice-1.c           |    5 +-
 .../goacc/kernels-decompose-ice-2.c           |    3 +-
 .../goacc/kernels-loop-3-acc-loop.c           |    2 +-
 .../c-c++-common/goacc/kernels-loop-3.c       |    2 +-
 ...duction.c => kernels-reduction-parloops.c} |    0
 .../c-c++-common/goacc/loop-2-kernels.c       |   20 +-
 .../c-c++-common/goacc/loop-auto-reductions.c |   22 +
 .../goacc/nested-reductions-2-parallel.c      |  138 +++
 ...kernels-conditional-loop-independent_seq.c |  129 ---
 ...parallelism-1-kernels-loop-auto-parloops.c |  128 +++
 .../note-parallelism-1-kernels-loop-auto.c    |  104 +-
 ...rallelism-1-kernels-loop-independent_seq.c |   19 +-
 .../goacc/note-parallelism-1-kernels-loops.c  |   11 +-
 ...note-parallelism-1-kernels-straight-line.c |   11 +-
 ...e-parallelism-combined-kernels-loop-auto.c |   34 +-
 ...sm-combined-kernels-loop-independent_seq.c |   16 -
 ...kernels-conditional-loop-independent_seq.c |   38 +-
 .../note-parallelism-kernels-loop-auto.c      |  100 +-
 ...parallelism-kernels-loop-independent_seq.c |   27 +-
 .../goacc/note-parallelism-kernels-loops-1.c  |   61 +
 .../note-parallelism-kernels-loops-parloops.c |   53 +
 .../goacc/note-parallelism-kernels-loops.c    |   39 +-
 .../c-c++-common/goacc/omp_data_optimize-1.c  |  677 +++++++++++
 gcc/testsuite/c-c++-common/goacc/routine-1.c  |    2 +-
 .../goacc/routine-level-of-parallelism-2.c    |    2 -
 .../c-c++-common/goacc/routine-nohost-1.c     |    4 +-
 gcc/testsuite/c-c++-common/unroll-1.c         |    8 +-
 gcc/testsuite/c-c++-common/unroll-4.c         |    4 +-
 .../g++.dg/goacc/omp_data_optimize-1.C        |  169 +++
 .../gcc.dg/goacc/graphite-parameter-1.c       |   21 +
 .../gcc.dg/goacc/graphite-parameter-2.c       |   23 +
 .../gcc.dg/goacc/loop-processing-1.c          |    7 +-
 .../gcc.dg/goacc/nested-function-1.c          |    3 +-
 gcc/testsuite/gcc.dg/graphite/alias-1.c       |   22 +
 gcc/testsuite/gcc.dg/tree-ssa/backprop-1.c    |    6 +-
 gcc/testsuite/gcc.dg/tree-ssa/backprop-2.c    |    4 +-
 gcc/testsuite/gcc.dg/tree-ssa/backprop-3.c    |    4 +-
 gcc/testsuite/gcc.dg/tree-ssa/backprop-4.c    |    6 +-
 gcc/testsuite/gcc.dg/tree-ssa/backprop-5.c    |    4 +-
 gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c    |    6 +-
 gcc/testsuite/gcc.dg/tree-ssa/cunroll-1.c     |    6 +-
 gcc/testsuite/gcc.dg/tree-ssa/cunroll-3.c     |    4 +-
 gcc/testsuite/gcc.dg/tree-ssa/cunroll-9.c     |    4 +-
 gcc/testsuite/gcc.dg/tree-ssa/ldist-17.c      |    2 +-
 gcc/testsuite/gcc.dg/tree-ssa/loop-38.c       |    4 +-
 gcc/testsuite/gcc.dg/tree-ssa/loopclosedphi.c |    2 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr21463.c       |    4 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr45427.c       |    4 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr61743-1.c     |    2 +-
 gcc/testsuite/gcc.dg/unroll-2.c               |    2 +-
 gcc/testsuite/gcc.dg/unroll-3.c               |    4 +-
 gcc/testsuite/gcc.dg/unroll-4.c               |    4 +-
 gcc/testsuite/gcc.dg/unroll-5.c               |    4 +-
 gcc/testsuite/gcc.dg/vect/bb-slp-59.c         |    2 +-
 gcc/testsuite/gcc.dg/vect/vect-profile-1.c    |    2 +-
 gcc/testsuite/gfortran.dg/assumed_type_2.f90  |    6 +-
 ...assify-kernels-unparallelized-parloops.f95 |   44 +
 .../goacc/classify-kernels-unparallelized.f95 |   26 +-
 .../gfortran.dg/goacc/classify-kernels.f95    |   26 +-
 .../gfortran.dg/goacc/classify-parallel.f95   |    6 +-
 .../gfortran.dg/goacc/classify-routine.f95    |    8 +-
 .../gfortran.dg/goacc/classify-serial.f95     |   11 +-
 .../gfortran.dg/goacc/common-block-3.f90      |   14 +-
 .../gfortran.dg/goacc/gang-static.f95         |   14 +-
 .../gfortran.dg/goacc/kernels-conversion.f95  |   52 +
 .../goacc/kernels-decompose-1-parloops.f95    |  121 ++
 .../gfortran.dg/goacc/kernels-decompose-1.f95 |  183 ++-
 .../gfortran.dg/goacc/kernels-decompose-2.f95 |  112 +-
 .../goacc/kernels-decompose-parloops-2.f95    |  154 +++
 .../gfortran.dg/goacc/kernels-loop-2.f95      |   13 +-
 .../gfortran.dg/goacc/kernels-loop-data-2.f95 |   13 +-
 .../goacc/kernels-loop-data-parloops-2.f95    |   52 +
 .../gfortran.dg/goacc/kernels-loop-inner.f95  |    6 +-
 .../goacc/kernels-loop-parloops-2.f95         |   45 +
 .../goacc/kernels-loop-parloops.f95           |   39 +
 .../gfortran.dg/goacc/kernels-loop.f95        |   12 +-
 .../gfortran.dg/goacc/kernels-reductions.f90  |   37 +
 .../gfortran.dg/goacc/kernels-tree.f95        |    2 +-
 .../gfortran.dg/goacc/loop-2-kernels.f95      |   22 +-
 .../goacc/loop-auto-transfer-2.f90            |   45 +
 .../goacc/loop-auto-transfer-3.f90            |   95 ++
 .../goacc/loop-auto-transfer-4.f90            |  293 +++++
 .../gfortran.dg/goacc/nested-function-1.f90   |    2 +
 .../goacc/nested-reductions-2-parallel.f90    |  177 +++
 .../gfortran.dg/goacc/omp_data_optimize-1.f90 |  588 ++++++++++
 gcc/testsuite/gfortran.dg/goacc/pr72741.f90   |    8 +-
 .../goacc/private-explicit-kernels-1.f95      |   13 +-
 .../goacc/private-predetermined-kernels-1.f95 |   16 +-
 .../goacc/routine-module-mod-1.f90            |    2 +-
 gcc/testsuite/gfortran.dg/graphite/block-2.f  |    9 +-
 .../gfortran.dg/graphite/block-3.f90          |    1 -
 .../gfortran.dg/graphite/block-4.f90          |    1 -
 gcc/testsuite/gfortran.dg/graphite/id-9.f     |    2 +-
 .../gfortran.dg/inline_matmul_24.f90          |    2 +-
 gcc/testsuite/gfortran.dg/no_arg_check_2.f90  |    6 +-
 gcc/testsuite/gfortran.dg/pr32921.f           |    2 +-
 gcc/testsuite/gfortran.dg/reassoc_4.f         |    2 +-
 gcc/tree-chrec.c                              |    3 +
 gcc/tree-data-ref.c                           |  107 +-
 gcc/tree-data-ref.h                           |    3 +
 gcc/tree-loop-distribution.c                  |   87 --
 gcc/tree-parloops.c                           |   18 +-
 gcc/tree-pass.h                               |    3 +
 gcc/tree-pretty-print.c                       |   11 +
 gcc/tree-pretty-print.h                       |    1 +
 gcc/tree-scalar-evolution.c                   |  179 ++-
 gcc/tree-scalar-evolution.h                   |    3 +
 gcc/tree-ssa-dce.c                            |   14 +
 gcc/tree-ssa-loop-im.c                        |   58 +-
 gcc/tree-ssa-loop-ivcanon.c                   |    2 +
 gcc/tree-ssa-loop-manip.h                     |    2 +-
 gcc/tree-ssa-loop-niter.c                     |    6 +
 gcc/tree-ssa-loop.c                           |  110 ++
 gcc/tree-ssa-phiprop.c                        |    2 +
 gcc/tree-ssa-pre.c                            |   17 +
 .../acc_prof-kernels-1.c                      |   19 +-
 .../kernels-decompose-1.c                     |    7 +-
 .../libgomp.oacc-c-c++-common/parallel-dims.c |   34 +-
 .../libgomp.oacc-c-c++-common/pr84955-1.c     |    1 -
 .../libgomp.oacc-c-c++-common/pr85381-2.c     |    8 +-
 .../libgomp.oacc-c-c++-common/pr85381-3.c     |    3 -
 .../libgomp.oacc-c-c++-common/pr85381-4.c     |    4 +-
 .../libgomp.oacc-c-c++-common/pr85486-2.c     |    2 +-
 .../libgomp.oacc-c-c++-common/pr85486-3.c     |    2 +-
 .../libgomp.oacc-c-c++-common/pr85486.c       |    2 +-
 .../runtime-alias-check-1.c                   |   79 ++
 .../runtime-alias-check-2.c                   |   90 ++
 .../vector-length-128-1.c                     |    3 +-
 .../vector-length-128-2.c                     |    3 +-
 .../vector-length-128-3.c                     |    3 +-
 .../vector-length-128-4.c                     |    3 +-
 .../vector-length-128-5.c                     |    3 +-
 .../vector-length-128-6.c                     |    3 +-
 .../vector-length-128-7.c                     |    3 +-
 .../gangprivate-attrib-1.f90                  |    5 +-
 .../gangprivate-attrib-2.f90                  |    3 +-
 .../kernels-acc-loop-reduction-2.f90          |   12 +-
 .../kernels-independent.f90                   |    1 +
 .../libgomp.oacc-fortran/kernels-loop-1.f90   |    1 +
 .../libgomp.oacc-fortran/pr94358-1.f90        |    7 +-
 201 files changed, 9403 insertions(+), 1524 deletions(-)
 create mode 100644 gcc/graphite-oacc.c
 create mode 100644 gcc/graphite-oacc.h
 create mode 100644 gcc/omp-data-optimize.cc
 create mode 100644 gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized-graphite.c
 rename gcc/testsuite/c-c++-common/goacc/{classify-kernels-unparallelized.c => classify-kernels-unparallelized-parloops.c} (84%)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/device-lowering-debug-optimization.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/device-lowering-no-loops.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/device-lowering-no-optimization.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-1-parloops.c
 rename gcc/testsuite/c-c++-common/goacc/{kernels-reduction.c => kernels-reduction-parloops.c} (100%)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/loop-auto-reductions.c
 delete mode 100644 gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-conditional-loop-independent_seq.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loop-auto-parloops.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loops-1.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loops-parloops.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/omp_data_optimize-1.c
 create mode 100644 gcc/testsuite/g++.dg/goacc/omp_data_optimize-1.C
 create mode 100644 gcc/testsuite/gcc.dg/goacc/graphite-parameter-1.c
 create mode 100644 gcc/testsuite/gcc.dg/goacc/graphite-parameter-2.c
 create mode 100644 gcc/testsuite/gcc.dg/graphite/alias-1.c
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized-parloops.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-decompose-1-parloops.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-decompose-parloops-2.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-parloops-2.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-parloops-2.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-parloops.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-reductions.f90
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-2.f90
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-3.f90
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-4.f90
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/omp_data_optimize-1.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-1.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-2.c

--
2.33.0

-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955