From patchwork Wed Nov 17 16:03:09 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Frederik Harwath X-Patchwork-Id: 1556242 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4HvSYV0yzJz9s1l for ; Thu, 18 Nov 2021 03:09:33 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 903AB385C408 for ; Wed, 17 Nov 2021 16:09:30 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98]) by sourceware.org (Postfix) with ESMTPS id 5D942385AC31 for ; Wed, 17 Nov 2021 16:03:43 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5D942385AC31 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com IronPort-SDR: igm5Ouchh2THM9xBFZO3u3BAah+CqWnnOfMQ2VysQ4QmVP9t/rci5cpCMChQodWZJL1/+qc/U4 5IljKvP11c4oSVFGkeRaURdlSJbavG/iMKmjVLYtvvgtmyhvVok1lNnM1ZJbP7C8/WPsDp7xTo sQ97GZQm45QVq+3Wdr6apG8F1DHEY6OqCRU9YE9/Qee6QgN8z7atKOX6NKcSkvQBUVTSHUQikX ZpeKuyFXV0m06HCMrPEcJly1A31BBMCv3bVZVoM9n0LDmsjPUFVnG8PlgiEJFT3dvx2StLlzE2 IkpmTHGPi7bTFNgfIrRMzSQX X-IronPort-AV: E=Sophos;i="5.87,241,1631606400"; d="scan'208";a="68604011" Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa2.mentor.iphmx.com with ESMTP; 17 Nov 2021 08:03:41 -0800 IronPort-SDR: 2Mo3SbSReV97AcR6GrFMXwkDZrz00RK/0OI9JQH4vLPWhtBFfTp/6rPRy5aGNUAGJroEp8N3p8 AIA6IHFh4kkoHivrkN6kdXrKgpHNXoSU8j0ToF1ehPiTjeZ8czWLeQ/Ts/37RS4jGhfC3L7ToY AyOnCuSRmWfYxGnEtnYkGNuK5P4L4e+zhkJ21Bzz1R26oVoTSsI2xnaR2niu7aqAcc6RHxptFo zTLulQ8r25t7vw72F0nRi0YoZj9npWKTgRh6Bs9p4EwJ9mCff9WKE3eZ7Vj9gvS3TqIn2TsX/4 6uI= From: Frederik Harwath To: Subject: [OG11][committed][PATCH 00/22] OpenACC "kernels" Improvements Date: Wed, 17 Nov 2021 17:03:09 +0100 Message-ID: <20211117160330.20029-1-frederik@codesourcery.com> X-Mailer: git-send-email 2.33.0 MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) To SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) X-Spam-Status: No, score=-6.1 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Hi, this patch series implements the re-work of the OpenACC "kernels" implementation that has been announced at the GNU Tools Track of this year's Linux Plumbers Conference; see https://linuxplumbersconf.org/event/11/contributions/998/. The central step is contained in the commit titled "openacc: Use Graphite for dependence analysis in \"kernels\" regions" whose commit message also contains further explanations. Best regards, Frederik PS: The commit series also includes a backport from master "00b98b6cac25 Add dg-final option-based target selectors" and two trivial unrelated commits "fa558c2a6664 Fix gimple_debug_cfg declaration" and "35cdc94463fe Fix branch prediction dump message" Andrew Stubbs (2): openacc: Add data optimization pass openacc: Add runtime alias checking for OpenACC kernels Frederik Harwath (19): openacc: Move pass_oacc_device_lower after pass_graphite graphite: Extend SCoP detection dump output graphite: Rename isl_id_for_ssa_name graphite: Fix minor mistakes in comments Fix branch prediction dump message Move compute_alias_check_pairs to tree-data-ref.c graphite: Add runtime alias checking openacc: Use Graphite for dependence analysis in "kernels" regions openacc: Add "can_be_parallel" flag info to "graph" dumps openacc: Add further kernels tests openacc: Remove unused partitioning in "kernels" regions Add function for printing a single OMP_CLAUSE openacc: Warn about "independent" "kernels" loops with data-dependences openacc: Handle internal function calls in pass_lim openacc: Disable pass_pre on outlined functions analyzed by Graphite graphite: Tune parameters for OpenACC use graphite: Adjust scop loop-nest choice graphite: Accept loops without data references openacc: Adjust test expectations to new "kernels" handling Sandra Loosemore (1): Fortran: delinearize multi-dimensional array accesses gcc/Makefile.in | 2 + gcc/cfgloop.c | 1 + gcc/cfgloop.h | 6 + gcc/cfgloopmanip.c | 1 + gcc/common.opt | 9 + gcc/config/nvptx/nvptx.c | 7 + gcc/doc/gimple.texi | 2 + gcc/doc/invoke.texi | 20 +- gcc/doc/passes.texi | 6 +- gcc/expr.c | 1 + gcc/flag-types.h | 1 + gcc/fortran/lang.opt | 4 + gcc/fortran/trans-array.c | 321 ++++-- gcc/gimple-loop-interchange.cc | 2 +- gcc/gimple-pretty-print.c | 3 + gcc/gimple-walk.c | 15 +- gcc/gimple-walk.h | 6 + gcc/gimple.h | 7 +- gcc/gimplify.c | 13 +- gcc/graph.c | 35 +- gcc/graphite-dependences.c | 220 +++- gcc/graphite-isl-ast-to-gimple.c | 271 ++++- gcc/graphite-oacc.c | 689 ++++++++++++ gcc/graphite-oacc.h | 55 + gcc/graphite-optimize-isl.c | 42 +- gcc/graphite-poly.c | 41 +- gcc/graphite-scop-detection.c | 654 +++++++++-- gcc/graphite-sese-to-poly.c | 90 +- gcc/graphite.c | 120 +- gcc/graphite.h | 40 +- gcc/internal-fn.c | 2 + gcc/internal-fn.h | 4 +- gcc/omp-data-optimize.cc | 951 ++++++++++++++++ gcc/omp-expand.c | 110 +- gcc/omp-general.c | 23 +- gcc/omp-general.h | 1 + gcc/omp-low.c | 321 +++++- gcc/omp-oacc-kernels-decompose.cc | 145 ++- gcc/omp-offload.c | 1001 +++++++++++++---- gcc/omp-offload.h | 2 + gcc/params.opt | 5 +- gcc/passes.c | 42 + gcc/passes.def | 47 +- gcc/predict.c | 2 +- gcc/sese.c | 25 +- gcc/sese.h | 19 + gcc/testsuite/c-c++-common/goacc/acc-icf.c | 4 +- gcc/testsuite/c-c++-common/goacc/cache-3-1.c | 2 +- ...classify-kernels-unparallelized-graphite.c | 41 + ...lassify-kernels-unparallelized-parloops.c} | 12 +- .../c-c++-common/goacc/classify-kernels.c | 27 +- .../c-c++-common/goacc/classify-parallel.c | 8 +- .../c-c++-common/goacc/classify-routine.c | 8 +- .../c-c++-common/goacc/classify-serial.c | 12 +- .../device-lowering-debug-optimization.c | 29 + .../goacc/device-lowering-no-loops.c | 17 + .../goacc/device-lowering-no-optimization.c | 30 + .../c-c++-common/goacc/if-clause-2.c | 2 +- .../goacc/kernels-decompose-1-parloops.c | 125 ++ .../c-c++-common/goacc/kernels-decompose-1.c | 31 +- .../c-c++-common/goacc/kernels-decompose-2.c | 2 +- .../goacc/kernels-decompose-ice-1.c | 5 +- .../goacc/kernels-decompose-ice-2.c | 3 +- .../goacc/kernels-loop-3-acc-loop.c | 2 +- .../c-c++-common/goacc/kernels-loop-3.c | 2 +- ...duction.c => kernels-reduction-parloops.c} | 0 .../c-c++-common/goacc/loop-2-kernels.c | 20 +- .../c-c++-common/goacc/loop-auto-reductions.c | 22 + .../goacc/nested-reductions-2-parallel.c | 138 +++ ...kernels-conditional-loop-independent_seq.c | 129 --- ...parallelism-1-kernels-loop-auto-parloops.c | 128 +++ .../note-parallelism-1-kernels-loop-auto.c | 104 +- ...rallelism-1-kernels-loop-independent_seq.c | 19 +- .../goacc/note-parallelism-1-kernels-loops.c | 11 +- ...note-parallelism-1-kernels-straight-line.c | 11 +- ...e-parallelism-combined-kernels-loop-auto.c | 34 +- ...sm-combined-kernels-loop-independent_seq.c | 16 - ...kernels-conditional-loop-independent_seq.c | 38 +- .../note-parallelism-kernels-loop-auto.c | 100 +- ...parallelism-kernels-loop-independent_seq.c | 27 +- .../goacc/note-parallelism-kernels-loops-1.c | 61 + .../note-parallelism-kernels-loops-parloops.c | 53 + .../goacc/note-parallelism-kernels-loops.c | 39 +- .../c-c++-common/goacc/omp_data_optimize-1.c | 677 +++++++++++ gcc/testsuite/c-c++-common/goacc/routine-1.c | 2 +- .../goacc/routine-level-of-parallelism-2.c | 2 - .../c-c++-common/goacc/routine-nohost-1.c | 4 +- gcc/testsuite/c-c++-common/unroll-1.c | 8 +- gcc/testsuite/c-c++-common/unroll-4.c | 4 +- .../g++.dg/goacc/omp_data_optimize-1.C | 169 +++ .../gcc.dg/goacc/graphite-parameter-1.c | 21 + .../gcc.dg/goacc/graphite-parameter-2.c | 23 + .../gcc.dg/goacc/loop-processing-1.c | 7 +- .../gcc.dg/goacc/nested-function-1.c | 3 +- gcc/testsuite/gcc.dg/graphite/alias-1.c | 22 + gcc/testsuite/gcc.dg/tree-ssa/backprop-1.c | 6 +- gcc/testsuite/gcc.dg/tree-ssa/backprop-2.c | 4 +- gcc/testsuite/gcc.dg/tree-ssa/backprop-3.c | 4 +- gcc/testsuite/gcc.dg/tree-ssa/backprop-4.c | 6 +- gcc/testsuite/gcc.dg/tree-ssa/backprop-5.c | 4 +- gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c | 6 +- gcc/testsuite/gcc.dg/tree-ssa/cunroll-1.c | 6 +- gcc/testsuite/gcc.dg/tree-ssa/cunroll-3.c | 4 +- gcc/testsuite/gcc.dg/tree-ssa/cunroll-9.c | 4 +- gcc/testsuite/gcc.dg/tree-ssa/ldist-17.c | 2 +- gcc/testsuite/gcc.dg/tree-ssa/loop-38.c | 4 +- gcc/testsuite/gcc.dg/tree-ssa/loopclosedphi.c | 2 +- gcc/testsuite/gcc.dg/tree-ssa/pr21463.c | 4 +- gcc/testsuite/gcc.dg/tree-ssa/pr45427.c | 4 +- gcc/testsuite/gcc.dg/tree-ssa/pr61743-1.c | 2 +- gcc/testsuite/gcc.dg/unroll-2.c | 2 +- gcc/testsuite/gcc.dg/unroll-3.c | 4 +- gcc/testsuite/gcc.dg/unroll-4.c | 4 +- gcc/testsuite/gcc.dg/unroll-5.c | 4 +- gcc/testsuite/gcc.dg/vect/bb-slp-59.c | 2 +- gcc/testsuite/gcc.dg/vect/vect-profile-1.c | 2 +- gcc/testsuite/gfortran.dg/assumed_type_2.f90 | 6 +- ...assify-kernels-unparallelized-parloops.f95 | 44 + .../goacc/classify-kernels-unparallelized.f95 | 26 +- .../gfortran.dg/goacc/classify-kernels.f95 | 26 +- .../gfortran.dg/goacc/classify-parallel.f95 | 6 +- .../gfortran.dg/goacc/classify-routine.f95 | 8 +- .../gfortran.dg/goacc/classify-serial.f95 | 11 +- .../gfortran.dg/goacc/common-block-3.f90 | 14 +- .../gfortran.dg/goacc/gang-static.f95 | 14 +- .../gfortran.dg/goacc/kernels-conversion.f95 | 52 + .../goacc/kernels-decompose-1-parloops.f95 | 121 ++ .../gfortran.dg/goacc/kernels-decompose-1.f95 | 183 ++- .../gfortran.dg/goacc/kernels-decompose-2.f95 | 112 +- .../goacc/kernels-decompose-parloops-2.f95 | 154 +++ .../gfortran.dg/goacc/kernels-loop-2.f95 | 13 +- .../gfortran.dg/goacc/kernels-loop-data-2.f95 | 13 +- .../goacc/kernels-loop-data-parloops-2.f95 | 52 + .../gfortran.dg/goacc/kernels-loop-inner.f95 | 6 +- .../goacc/kernels-loop-parloops-2.f95 | 45 + .../goacc/kernels-loop-parloops.f95 | 39 + .../gfortran.dg/goacc/kernels-loop.f95 | 12 +- .../gfortran.dg/goacc/kernels-reductions.f90 | 37 + .../gfortran.dg/goacc/kernels-tree.f95 | 2 +- .../gfortran.dg/goacc/loop-2-kernels.f95 | 22 +- .../goacc/loop-auto-transfer-2.f90 | 45 + .../goacc/loop-auto-transfer-3.f90 | 95 ++ .../goacc/loop-auto-transfer-4.f90 | 293 +++++ .../gfortran.dg/goacc/nested-function-1.f90 | 2 + .../goacc/nested-reductions-2-parallel.f90 | 177 +++ .../gfortran.dg/goacc/omp_data_optimize-1.f90 | 588 ++++++++++ gcc/testsuite/gfortran.dg/goacc/pr72741.f90 | 8 +- .../goacc/private-explicit-kernels-1.f95 | 13 +- .../goacc/private-predetermined-kernels-1.f95 | 16 +- .../goacc/routine-module-mod-1.f90 | 2 +- gcc/testsuite/gfortran.dg/graphite/block-2.f | 9 +- .../gfortran.dg/graphite/block-3.f90 | 1 - .../gfortran.dg/graphite/block-4.f90 | 1 - gcc/testsuite/gfortran.dg/graphite/id-9.f | 2 +- .../gfortran.dg/inline_matmul_24.f90 | 2 +- gcc/testsuite/gfortran.dg/no_arg_check_2.f90 | 6 +- gcc/testsuite/gfortran.dg/pr32921.f | 2 +- gcc/testsuite/gfortran.dg/reassoc_4.f | 2 +- gcc/tree-chrec.c | 3 + gcc/tree-data-ref.c | 107 +- gcc/tree-data-ref.h | 3 + gcc/tree-loop-distribution.c | 87 -- gcc/tree-parloops.c | 18 +- gcc/tree-pass.h | 3 + gcc/tree-pretty-print.c | 11 + gcc/tree-pretty-print.h | 1 + gcc/tree-scalar-evolution.c | 179 ++- gcc/tree-scalar-evolution.h | 3 + gcc/tree-ssa-dce.c | 14 + gcc/tree-ssa-loop-im.c | 58 +- gcc/tree-ssa-loop-ivcanon.c | 2 + gcc/tree-ssa-loop-manip.h | 2 +- gcc/tree-ssa-loop-niter.c | 6 + gcc/tree-ssa-loop.c | 110 ++ gcc/tree-ssa-phiprop.c | 2 + gcc/tree-ssa-pre.c | 17 + .../acc_prof-kernels-1.c | 19 +- .../kernels-decompose-1.c | 7 +- .../libgomp.oacc-c-c++-common/parallel-dims.c | 34 +- .../libgomp.oacc-c-c++-common/pr84955-1.c | 1 - .../libgomp.oacc-c-c++-common/pr85381-2.c | 8 +- .../libgomp.oacc-c-c++-common/pr85381-3.c | 3 - .../libgomp.oacc-c-c++-common/pr85381-4.c | 4 +- .../libgomp.oacc-c-c++-common/pr85486-2.c | 2 +- .../libgomp.oacc-c-c++-common/pr85486-3.c | 2 +- .../libgomp.oacc-c-c++-common/pr85486.c | 2 +- .../runtime-alias-check-1.c | 79 ++ .../runtime-alias-check-2.c | 90 ++ .../vector-length-128-1.c | 3 +- .../vector-length-128-2.c | 3 +- .../vector-length-128-3.c | 3 +- .../vector-length-128-4.c | 3 +- .../vector-length-128-5.c | 3 +- .../vector-length-128-6.c | 3 +- .../vector-length-128-7.c | 3 +- .../gangprivate-attrib-1.f90 | 5 +- .../gangprivate-attrib-2.f90 | 3 +- .../kernels-acc-loop-reduction-2.f90 | 12 +- .../kernels-independent.f90 | 1 + .../libgomp.oacc-fortran/kernels-loop-1.f90 | 1 + .../libgomp.oacc-fortran/pr94358-1.f90 | 7 +- 201 files changed, 9403 insertions(+), 1524 deletions(-) create mode 100644 gcc/graphite-oacc.c create mode 100644 gcc/graphite-oacc.h create mode 100644 gcc/omp-data-optimize.cc create mode 100644 gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized-graphite.c rename gcc/testsuite/c-c++-common/goacc/{classify-kernels-unparallelized.c => classify-kernels-unparallelized-parloops.c} (84%) create mode 100644 gcc/testsuite/c-c++-common/goacc/device-lowering-debug-optimization.c create mode 100644 gcc/testsuite/c-c++-common/goacc/device-lowering-no-loops.c create mode 100644 gcc/testsuite/c-c++-common/goacc/device-lowering-no-optimization.c create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-1-parloops.c rename gcc/testsuite/c-c++-common/goacc/{kernels-reduction.c => kernels-reduction-parloops.c} (100%) create mode 100644 gcc/testsuite/c-c++-common/goacc/loop-auto-reductions.c delete mode 100644 gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-conditional-loop-independent_seq.c create mode 100644 gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loop-auto-parloops.c create mode 100644 gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loops-1.c create mode 100644 gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loops-parloops.c create mode 100644 gcc/testsuite/c-c++-common/goacc/omp_data_optimize-1.c create mode 100644 gcc/testsuite/g++.dg/goacc/omp_data_optimize-1.C create mode 100644 gcc/testsuite/gcc.dg/goacc/graphite-parameter-1.c create mode 100644 gcc/testsuite/gcc.dg/goacc/graphite-parameter-2.c create mode 100644 gcc/testsuite/gcc.dg/graphite/alias-1.c create mode 100644 gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized-parloops.f95 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-decompose-1-parloops.f95 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-decompose-parloops-2.f95 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-parloops-2.f95 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-parloops-2.f95 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-parloops.f95 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-reductions.f90 create mode 100644 gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-2.f90 create mode 100644 gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-3.f90 create mode 100644 gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-4.f90 create mode 100644 gcc/testsuite/gfortran.dg/goacc/omp_data_optimize-1.f90 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-1.c create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-2.c --- 2.33.0 ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955