From patchwork Thu Nov 5 10:57:12 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tom de Vries X-Patchwork-Id: 540349 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 77A471412ED for ; Thu, 5 Nov 2015 21:57:51 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=vkd5NVPk; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:to:references:cc:from:message-id:date:mime-version :in-reply-to:content-type; q=dns; s=default; b=bzGNNC826LfM5blpS NcVvTx9C98hQGvNt6B2uCjl6aUgPdRBx1iF9yyKi94p/6fMy/VmBiz1ztKFu98Ou kOO5Lw0ixue4N/a59Q6rHIwvy1EZO0MmWuPkdXkSwxgp3MTfVUcq/nZkUzinFYLT 2XRuBU7T5FKeTn+5tSyNJaje74= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :subject:to:references:cc:from:message-id:date:mime-version :in-reply-to:content-type; s=default; bh=9vba5P+UehNt9vCji5x2DnA Ug00=; b=vkd5NVPkZQHy7b0yKP88jhFKElgdGilD0wX7LzOKlvZLNgapUnbYG0V rC+kd61Qf8+xHnXcpkKxG69l7QOijRHO2mg2a8Zt0RBP7EbnkC2v/OkPqhk99t+R LaI3p2IHm6+zz1YTbeM0EH9cRWptlWQpYOKDOQGfvSp1O5f9f+cM= Received: (qmail 30728 invoked by alias); 5 Nov 2015 10:57:42 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 29908 invoked by uid 89); 5 Nov 2015 10:57:42 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL, BAYES_00, SPF_PASS, T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 X-HELO: fencepost.gnu.org Received: from fencepost.gnu.org (HELO fencepost.gnu.org) (208.118.235.10) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Thu, 05 Nov 2015 10:57:39 +0000 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46143) by fencepost.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1ZuIEb-00048S-Ca for gcc-patches@gnu.org; Thu, 05 Nov 2015 05:57:37 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZuIEW-00029U-F4 for gcc-patches@gnu.org; Thu, 05 Nov 2015 05:57:36 -0500 Received: from relay1.mentorg.com ([192.94.38.131]:41267) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZuIEW-00029N-6l for gcc-patches@gnu.org; Thu, 05 Nov 2015 05:57:32 -0500 Received: from nat-ies.mentorg.com ([192.94.31.2] helo=SVR-IES-FEM-01.mgc.mentorg.com) by relay1.mentorg.com with esmtp id 1ZuIEU-0006jR-66 from Tom_deVries@mentor.com ; Thu, 05 Nov 2015 02:57:30 -0800 Received: from [127.0.0.1] (137.202.0.76) by SVR-IES-FEM-01.mgc.mentorg.com (137.202.0.104) with Microsoft SMTP Server id 14.3.224.2; Thu, 5 Nov 2015 10:57:28 +0000 Subject: [gomp4, committed, 5/9] Handle oacc function in parloops To: "gcc-patches@gnu.org" References: <563B2C99.90308@mentor.com> CC: Jakub Jelinek From: Tom de Vries Message-ID: <563B3608.9040707@mentor.com> Date: Thu, 5 Nov 2015 11:57:12 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <563B2C99.90308@mentor.com> X-detected-operating-system: by eggs.gnu.org: Windows NT kernel [generic] [fuzzy] X-Received-From: 192.94.38.131 On 05/11/15 11:16, Tom de Vries wrote: > Hi, > > now that we have committed -foffload-alias in gomp-4_0-branch ( > https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00214.html ), we no longer > need the kernels region to be a part of the original function when doing > alias analysis. > > So, we no longer have the need to postpone splitting off the kernels > region into a seperate function until after alias analysis, but we can > do this at the same time as when we expand the parallel region. > > The following patch series implements that: > > 1 Move expansion of kernels region back to first omp-expand > 2 Update gate_oacc_kernels to handle oacc function > 3 Revert "Add skip_stmt parm to pass_dominator::get_sese ()" > 4 Revert "Add pass_dominator::sese_mode_p ()" > 5 Handle oacc function in parloops > 6 Update goacc kernels C testcases > 7 Update goacc kernels Fortran testcases > 8 Release_defs in expand_omp_atomic_fetch_op > 9 Remove BUILT_IN_GOACC_KERNELS_INTERNAL > > [ The patch series is broken up into logical bits, but intended as > single commit. Various things in kernels support will be broken in > intermediate stages. ] > > Committed to gomp-4_0-branch. > > I'll post the patches in reply to this message. This patch removes handling of kernels regions in tree-parloops.c, and adds handling of oacc functions that used to be kernels regions before they were split off. That means we no longer add a parallel pragma. OTOH, we now have to clear PROP_gimple_eomp in order to trigger the subsequent omp-expand pass. Thanks, - Tom Handle oacc function in parloops 2015-11-04 Tom de Vries * omp-low.c (set_oacc_fn_attrib): Remove static. * omp-low.h (set_oacc_fn_attrib): Declare. * tree-parloops.c (create_parallel_loop): Remove region_entry parameter. Remove handling of oacc kernels pragma and GOACC_kernels_internal call. Remove insertion of oacc parallel pragma. Set oacc function attributes. (gen_parallel_loop): Remove region_entry parameter. (get_omp_data_i_param): New function. (try_create_reduction_list): Use get_omp_data_i_param instead of gimple_stmt_omp_data_i_init_p. (ref_conflicts_with_region): Handle GIMPLE_RETURN. (oacc_entry_exit_ok_1): Same. Add missing is_gimple_call test before gimple_call_internal_p test. (oacc_entry_exit_ok): Remove region_entry parameter. Use get_omp_data_i_param instead of get_omp_data_i. Set region_bbs to all bbs in function. Use function entry as region entry. (parallelize_loops): Allow oacc functions and parallelized function if oacc_kernels_p. Remove region_entry variable. (pass_parallelize_loops_oacc_kernels::execute): Clear PROP_gimple_eomp if a loop was parallelized. --- gcc/omp-low.c | 2 +- gcc/omp-low.h | 1 + gcc/tree-parloops.c | 119 ++++++++++++++++++++++------------------------------ 3 files changed, 51 insertions(+), 71 deletions(-) diff --git a/gcc/omp-low.c b/gcc/omp-low.c index ac8c8d0..58cb959 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -12456,7 +12456,7 @@ replace_oacc_fn_attrib (tree fn, tree dims) function attribute. Push any that are non-constant onto the ARGS list, along with an appropriate GOMP_LAUNCH_DIM tag. */ -static void +void set_oacc_fn_attrib (tree fn, tree clauses, vec *args) { /* Must match GOMP_DIM ordering. */ diff --git a/gcc/omp-low.h b/gcc/omp-low.h index 7c9efdc..673b470 100644 --- a/gcc/omp-low.h +++ b/gcc/omp-low.h @@ -40,6 +40,7 @@ extern vec get_bbs_in_oacc_kernels_region (basic_block, extern void replace_oacc_fn_attrib (tree, tree); extern tree build_oacc_routine_dims (tree); extern tree get_oacc_fn_attrib (tree); +extern void set_oacc_fn_attrib (tree, tree, vec *); extern GTY(()) vec *offload_funcs; extern GTY(()) vec *offload_vars; diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c index f14cf8a..c038dfe 100644 --- a/gcc/tree-parloops.c +++ b/gcc/tree-parloops.c @@ -2017,7 +2017,7 @@ transform_to_exit_first_loop (struct loop *loop, static void create_parallel_loop (struct loop *loop, tree loop_fn, tree data, tree new_data, unsigned n_threads, location_t loc, - basic_block region_entry, bool oacc_kernels_p) + bool oacc_kernels_p) { gimple_stmt_iterator gsi; basic_block bb, paral_bb, for_bb, ex_bb, continue_bb; @@ -2039,10 +2039,6 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data, paral_bb = single_pred (bb); gsi = gsi_last_bb (paral_bb); } - else - /* Make sure the oacc parallel is inserted on top of the oacc kernels - region. */ - gsi = gsi_last_bb (region_entry); if (!oacc_kernels_p) { @@ -2056,50 +2052,10 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data, } else { - /* Create oacc parallel pragma based on oacc kernels pragma and - GOACC_kernels_internal call. */ - gomp_target *kernels = as_a (gsi_stmt (gsi)); - - tree clauses = gimple_omp_target_clauses (kernels); - /* FIXME: We need a more intelligent mapping onto vector, gangs, - workers. */ - if (1) - { - tree clause = build_omp_clause (gimple_location (kernels), - OMP_CLAUSE_NUM_GANGS); - OMP_CLAUSE_NUM_GANGS_EXPR (clause) - = build_int_cst (integer_type_node, n_threads); - OMP_CLAUSE_CHAIN (clause) = clauses; - clauses = clause; - } - gomp_target *stmt - = gimple_build_omp_target (NULL, GF_OMP_TARGET_KIND_OACC_PARALLEL, - clauses); - tree child_fn = gimple_omp_target_child_fn (kernels); - gimple_omp_target_set_child_fn (stmt, child_fn); - tree data_arg = gimple_omp_target_data_arg (kernels); - gimple_omp_target_set_data_arg (stmt, data_arg); - - gimple_set_location (stmt, loc); - - /* Insert oacc parallel pragma after the oacc kernels pragma. */ - { - gimple_stmt_iterator gsi2; - gsi = gsi_last_bb (region_entry); - gsi2 = gsi; - gsi_prev (&gsi2); - - /* Insert pragma acc parallel. */ - gsi_insert_after (&gsi, stmt, GSI_NEW_STMT); - - /* Remove GOACC_kernels_internal call. */ - replace_uses_by (gimple_vdef (gsi_stmt (gsi2)), - gimple_vuse (gsi_stmt (gsi2))); - gsi_remove (&gsi2, true); - - /* Remove pragma acc kernels. */ - gsi_remove (&gsi2, true); - } + tree clause = build_omp_clause (loc, OMP_CLAUSE_NUM_GANGS); + OMP_CLAUSE_NUM_GANGS_EXPR (clause) + = build_int_cst (integer_type_node, n_threads); + set_oacc_fn_attrib (cfun->decl, clause, NULL); } /* Initialize NEW_DATA. */ @@ -2274,7 +2230,7 @@ static void gen_parallel_loop (struct loop *loop, reduction_info_table_type *reduction_list, unsigned n_threads, struct tree_niter_desc *niter, - basic_block region_entry, bool oacc_kernels_p) + bool oacc_kernels_p) { tree many_iterations_cond, type, nit; tree arg_struct, new_arg_struct; @@ -2457,7 +2413,7 @@ gen_parallel_loop (struct loop *loop, if (cond_stmt) loc = gimple_location (cond_stmt); create_parallel_loop (loop, create_loop_fn (loc), arg_struct, new_arg_struct, - n_threads, loc, region_entry, oacc_kernels_p); + n_threads, loc, oacc_kernels_p); if (reduction_list->elements () > 0) create_call_for_reduction (loop, reduction_list, &clsn_data); @@ -2650,6 +2606,22 @@ try_get_loop_niter (loop_p loop, struct tree_niter_desc *niter) return true; } +static tree +get_omp_data_i_param (void) +{ + tree decl = DECL_ARGUMENTS (cfun->decl); + gcc_assert (DECL_CHAIN (decl) == NULL_TREE); + for (unsigned int i = 0; i < num_ssa_names; ++i) + { + tree name = ssa_name (i); + if (name != NULL_TREE + && SSA_NAME_VAR (name) == decl) + return name; + } + + gcc_unreachable (); +} + /* Try to initialize REDUCTION_LIST for code generation part. REDUCTION_LIST describes the reductions. */ @@ -2795,7 +2767,7 @@ try_create_reduction_list (loop_p loop, return false; addr2 = TREE_OPERAND (addr2, 0); if (TREE_CODE (addr2) != SSA_NAME - || !gimple_stmt_omp_data_i_init_p (SSA_NAME_DEF_STMT (addr2))) + || addr2 != get_omp_data_i_param ()) return false; red->reduc_addr = addr; } @@ -2849,6 +2821,9 @@ ref_conflicts_with_region (gimple_stmt_iterator gsi, ao_ref *ref, && !gimple_vuse (stmt)) continue; + if (gimple_code (stmt) == GIMPLE_RETURN) + continue; + if (ref_is_store) { if (dead_load_p (stmt)) @@ -2989,9 +2964,12 @@ oacc_entry_exit_ok_1 (bitmap in_loop_bbs, vec region_bbs, && !gimple_vdef (stmt) && !gimple_vuse (stmt)) continue; - else if (gimple_call_internal_p (stmt) + else if (is_gimple_call (stmt) + && gimple_call_internal_p (stmt) && gimple_call_internal_fn (stmt) == IFN_GOACC_DIM_POS) continue; + else if (gimple_code (stmt) == GIMPLE_RETURN) + continue; else { if (dump_file) @@ -3119,19 +3097,17 @@ oacc_entry_exit_single_gang (bitmap in_loop_bbs, vec region_bbs, } static bool -oacc_entry_exit_ok (struct loop *loop, basic_block region_entry, +oacc_entry_exit_ok (struct loop *loop, reduction_info_table_type *reduction_list) { basic_block *loop_bbs = get_loop_body_in_dom_order (loop); - basic_block region_exit - = get_oacc_kernels_region_exit (single_succ (region_entry)); - vec region_bbs - = get_bbs_in_oacc_kernels_region (region_entry, region_exit); - tree omp_data_i = get_omp_data_i (region_entry); + tree omp_data_i = get_omp_data_i_param (); gcc_assert (omp_data_i != NULL_TREE); + vec region_bbs + = get_all_dominated_blocks (CDI_DOMINATORS, ENTRY_BLOCK_PTR_FOR_FN (cfun)); - gimple_stmt_iterator gsi = gsi_for_stmt (SSA_NAME_DEF_STMT (omp_data_i)); - gsi_next_nondebug (&gsi); + gimple_stmt_iterator gsi + = gsi_start_bb (single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun))); gimple *stmt = gsi_stmt (gsi); gcc_assert (gimple_call_internal_p (stmt) && gimple_call_internal_fn (stmt) == IFN_GOACC_DIM_POS); @@ -3182,15 +3158,16 @@ parallelize_loops (bool oacc_kernels_p) struct obstack parloop_obstack; HOST_WIDE_INT estimated; source_location loop_loc; - basic_block region_entry = NULL; /* Do not parallelize loops in the functions created by parallelization. */ - if (parallelized_function_p (cfun->decl)) + if (!oacc_kernels_p + && parallelized_function_p (cfun->decl)) return false; /* Do not parallelize loops in offloaded functions. */ - if (get_oacc_fn_attrib (cfun->decl) != NULL) - return false; + if (!oacc_kernels_p + && get_oacc_fn_attrib (cfun->decl) != NULL) + return false; if (cfun->has_nonlocal_label) return false; @@ -3231,8 +3208,6 @@ parallelize_loops (bool oacc_kernels_p) fprintf (dump_file, "Trying loop %d with header bb %d in oacc kernels region\n", loop->num, loop->header->index); - - region_entry = loop_get_oacc_kernels_region_entry (loop); } if (dump_file && (dump_flags & TDF_DETAILS)) @@ -3309,7 +3284,7 @@ parallelize_loops (bool oacc_kernels_p) } if (oacc_kernels_p - && !oacc_entry_exit_ok (loop, region_entry, &reduction_list)) + && !oacc_entry_exit_ok (loop, &reduction_list)) { if (dump_file) fprintf (dump_file, "entry/exit not ok: FAILED\n"); @@ -3332,7 +3307,7 @@ parallelize_loops (bool oacc_kernels_p) } gen_parallel_loop (loop, &reduction_list, - n_threads, &niter_desc, region_entry, oacc_kernels_p); + n_threads, &niter_desc, oacc_kernels_p); } obstack_free (&parloop_obstack, NULL); @@ -3437,7 +3412,11 @@ pass_parallelize_loops_oacc_kernels::execute (function *fun) return 0; if (parallelize_loops (true)) - return TODO_update_ssa; + { + fun->curr_properties &= ~(PROP_gimple_eomp); + + return TODO_update_ssa; + } return 0; } -- 1.9.1