From patchwork Sat Sep 27 18:16:47 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Verbin X-Patchwork-Id: 394085 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 46E7E140092 for ; Sun, 28 Sep 2014 04:17:12 +1000 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:mime-version:content-type; q=dns; s=default; b=jCZoXRZPuo4KCBhcA50HjgMG6BPMt+G7rlCA6B6SsPFa3unlxc qcE+7mzbjJNEqVr/VqAQERAs/5Tbf8vI7otzNsGowXmePO6YZ5fJc2qUNvPyseiq cEOqKjc/KOBaCH7L58psw5r2JRsoQafs9xE8WGBIoVxuMsW+rCSZrggHU= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:mime-version:content-type; s= default; bh=Hm+/D5YoTJZBWVzW1qlXCzg2W7U=; b=c+bF5PfA0GPcli7gHZXA 5AwG0b0jy1Q0VA1yrX8W8Zmnos80A8cEyjx41afUzuJqYVBr+bM1kJHWyWJOralF iaAYCBusyh3kE1WzHMiT01SEmh/DvBgkJ2RkavK+zSlvqeA5EMwxuD9vhlvj6qnE 2E+TqZxmrvudnHOC0OOVFs0= Received: (qmail 14324 invoked by alias); 27 Sep 2014 18:17:05 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 14303 invoked by uid 89); 27 Sep 2014 18:17:04 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.6 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, KAM_STOCKGEN, RCVD_IN_DNSWL_LOW, SPF_PASS, UNSUBSCRIBE_BODY autolearn=no version=3.3.2 X-HELO: mail-yh0-f41.google.com Received: from mail-yh0-f41.google.com (HELO mail-yh0-f41.google.com) (209.85.213.41) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Sat, 27 Sep 2014 18:17:02 +0000 Received: by mail-yh0-f41.google.com with SMTP id b6so5571861yha.0 for ; Sat, 27 Sep 2014 11:17:00 -0700 (PDT) X-Received: by 10.236.197.168 with SMTP id t28mr38164977yhn.35.1411841820530; Sat, 27 Sep 2014 11:17:00 -0700 (PDT) Received: from msticlxl57.ims.intel.com ([192.55.55.41]) by mx.google.com with ESMTPSA id y67sm4310037yhc.11.2014.09.27.11.16.55 for (version=TLSv1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Sat, 27 Sep 2014 11:16:59 -0700 (PDT) Date: Sat, 27 Sep 2014 22:16:47 +0400 From: Ilya Verbin To: Jakub Jelinek , Richard Biener , Jan Hubicka , gcc-patches@gcc.gnu.org Cc: Kirill Yukhin , Ilya Tocar , Andrey Turetskiy , Bernd Schmidt , Thomas Schwinge Subject: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming Message-ID: <20140927181647.GA1819@msticlxl57.ims.intel.com> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-IsSubscribed: yes Hello, This patch enables the streaming of LTO bytecode, needed by offload target, using existing LTO infrastructure. It creates new prefix for the section names (.gnu.target_lto_) and streams out the functions and variables with "omp declare target" attribute, including the functions for outlined '#pragma omp target' regions. The offload compiler (under ifdef ACCEL_COMPILER) reads and compiles these new sections. But I have doubts regarding the offload_lto_mode switch. Why I added it: The outlined target regions (say omp_fn0) contains references from the parent functions. And that's correct for the case when we stream out the host-side version of omp_fn0. But for the target version there are no parent functions, node->used_from_other_partition gets incorrect value (always 1), and offload compiler crashes on streaming in. Another solution is to remain referenced_from_other_partition_p and reachable_from_other_partition_p unchanged, then used_from_other_partition will have incorrect value for target regions, but the offload compiler will just ignore it. Which approach is better? Anyway, now it's bootstrapped and regtested on i686-linux and x86_64-linux. 2014-09-27 Ilya Verbin Ilya Tocar Andrey Turetskiy Bernd Schmidt gcc/ * cgraph.h (symtab_node): Add need_dump flag. * cgraphunit.c: Include lto-section-names.h. (initialize_offload): New function. (ipa_passes): Initialize offload and call ipa_write_summaries if there is something to write to OMP_SECTION_NAME_PREFIX sections. (symbol_table::compile): Call lto_streamer_hooks_init under flag_openmp. * ipa-inline-analysis.c (inline_generate_summary): Do not exit under flag_openmp. (inline_free_summary): Always remove hooks. * lto-cgraph.c (lto_set_symtab_encoder_in_partition): Exit if there is no need to encode the node. (referenced_from_other_partition_p, reachable_from_other_partition_p): Ignore references from non-target functions to target functions if we are streaming out target-side bytecode (offload lto mode). (select_what_to_dump): New function. * lto-section-names.h (OMP_SECTION_NAME_PREFIX): Define. (section_name_prefix): Declare. * lto-streamer.c (offload_lto_mode): New variable. (section_name_prefix): New variable. (lto_get_section_name): Use section_name_prefix instead of LTO_SECTION_NAME_PREFIX. * lto-streamer.h (select_what_to_dump): Declare. (offload_lto_mode): Declare. * omp-low.c (is_targetreg_ctx): New function. (create_omp_child_function, check_omp_nesting_restrictions): Use it. (expand_omp_target): Set mark_force_output for the target functions. (lower_omp_critical): Add target attribute for omp critical symbol. * passes.c (ipa_write_summaries): Call select_what_to_dump. gcc/lto/ * lto-object.c (lto_obj_add_section): Use section_name_prefix instead of LTO_SECTION_NAME_PREFIX. * lto-partition.c (add_symbol_to_partition_1): Always set node->need_dump to true. (lto_promote_cross_file_statics): Call select_what_to_dump. * lto.c (lto_section_with_id): Use section_name_prefix instead of LTO_SECTION_NAME_PREFIX. (read_cgraph_and_symbols): Read OMP_SECTION_NAME_PREFIX sections, if being built as an offload compiler. Thanks, -- Ilya diff --git a/gcc/cgraph.h b/gcc/cgraph.h index 7481906..9ab970d 100644 --- a/gcc/cgraph.h +++ b/gcc/cgraph.h @@ -444,6 +444,11 @@ public: /* Set when init priority is set. */ unsigned in_init_priority_hash : 1; + /* Set when symbol needs to be dumped into LTO bytecode for LTO, + or in pragma omp target case, for separate compilation targeting + a different architecture. */ + unsigned need_dump : 1; + /* Ordering of all symtab entries. */ int order; diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c index b854e4b..4ab4c57 100644 --- a/gcc/cgraphunit.c +++ b/gcc/cgraphunit.c @@ -211,6 +211,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-nested.h" #include "gimplify.h" #include "dbgcnt.h" +#include "lto-section-names.h" /* Queue of cgraph nodes scheduled to be added into cgraph. This is a secondary queue used during optimization to accommodate passes that @@ -1994,9 +1995,40 @@ output_in_order (bool no_reorder) free (nodes); } +/* Check whether there is at least one function or global variable to offload. + */ + +static bool +initialize_offload (void) +{ + bool have_offload = false; + struct cgraph_node *node; + struct varpool_node *vnode; + + FOR_EACH_DEFINED_FUNCTION (node) + if (lookup_attribute ("omp declare target", DECL_ATTRIBUTES (node->decl))) + { + have_offload = true; + break; + } + + FOR_EACH_DEFINED_VARIABLE (vnode) + { + if (!lookup_attribute ("omp declare target", + DECL_ATTRIBUTES (vnode->decl)) + || TREE_CODE (vnode->decl) != VAR_DECL + || DECL_SIZE (vnode->decl) == 0) + continue; + have_offload = true; + } + + return have_offload; +} + static void ipa_passes (void) { + bool have_offload = false; gcc::pass_manager *passes = g->get_passes (); set_cfun (NULL); @@ -2004,6 +2036,14 @@ ipa_passes (void) gimple_register_cfg_hooks (); bitmap_obstack_initialize (NULL); + if (!in_lto_p && flag_openmp) + { + have_offload = initialize_offload (); + /* OpenMP offloading requires LTO infrastructure. */ + if (have_offload) + flag_generate_lto = 1; + } + invoke_plugin_callbacks (PLUGIN_ALL_IPA_PASSES_START, NULL); if (!in_lto_p) @@ -2041,7 +2081,20 @@ ipa_passes (void) targetm.asm_out.lto_start (); if (!in_lto_p) - ipa_write_summaries (); + { + if (have_offload) + { + offload_lto_mode = true; + section_name_prefix = OMP_SECTION_NAME_PREFIX; + ipa_write_summaries (); + } + if (flag_lto) + { + offload_lto_mode = false; + section_name_prefix = LTO_SECTION_NAME_PREFIX; + ipa_write_summaries (); + } + } if (flag_generate_lto) targetm.asm_out.lto_end (); @@ -2122,7 +2175,7 @@ symbol_table::compile (void) state = IPA; /* If LTO is enabled, initialize the streamer hooks needed by GIMPLE. */ - if (flag_lto) + if (flag_lto || flag_openmp) lto_streamer_hooks_init (); /* Don't run the IPA passes if there was any error or sorry messages. */ diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c index 38f56d2..076a1e8 100644 --- a/gcc/ipa-inline-analysis.c +++ b/gcc/ipa-inline-analysis.c @@ -4010,7 +4010,7 @@ inline_generate_summary (void) /* When not optimizing, do not bother to analyze. Inlining is still done because edge redirection needs to happen there. */ - if (!optimize && !flag_lto && !flag_wpa) + if (!optimize && !flag_lto && !flag_wpa && !flag_openmp) return; function_insertion_hook_holder = @@ -4325,11 +4325,6 @@ void inline_free_summary (void) { struct cgraph_node *node; - if (!inline_edge_summary_vec.exists ()) - return; - FOR_EACH_DEFINED_FUNCTION (node) - if (!node->alias) - reset_inline_summary (node); if (function_insertion_hook_holder) symtab->remove_cgraph_insertion_hook (function_insertion_hook_holder); function_insertion_hook_holder = NULL; @@ -4345,6 +4340,11 @@ inline_free_summary (void) if (edge_duplication_hook_holder) symtab->remove_edge_duplication_hook (edge_duplication_hook_holder); edge_duplication_hook_holder = NULL; + if (!inline_edge_summary_vec.exists ()) + return; + FOR_EACH_DEFINED_FUNCTION (node) + if (!node->alias) + reset_inline_summary (node); vec_free (inline_summary_vec); inline_edge_summary_vec.release (); if (edge_predicate_pool) diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c index 0584946..78b7fc8 100644 --- a/gcc/lto-cgraph.c +++ b/gcc/lto-cgraph.c @@ -239,6 +239,9 @@ void lto_set_symtab_encoder_in_partition (lto_symtab_encoder_t encoder, symtab_node *node) { + /* Ignore not needed nodes. */ + if (!node->need_dump) + return; int index = lto_symtab_encoder_encode (encoder, node); encoder->nodes[index].in_partition = true; } @@ -321,6 +324,12 @@ referenced_from_other_partition_p (symtab_node *node, lto_symtab_encoder_t encod for (i = 0; node->iterate_referring (i, ref); i++) { + /* Ignore references from non-target functions in offload lto mode. */ + if (offload_lto_mode + && !lookup_attribute ("omp declare target", + DECL_ATTRIBUTES (ref->referring->decl))) + continue; + if (ref->referring->in_other_partition || !lto_symtab_encoder_in_partition_p (encoder, ref->referring)) return true; @@ -339,9 +348,17 @@ reachable_from_other_partition_p (struct cgraph_node *node, lto_symtab_encoder_t if (node->global.inlined_to) return false; for (e = node->callers; e; e = e->next_caller) - if (e->caller->in_other_partition - || !lto_symtab_encoder_in_partition_p (encoder, e->caller)) - return true; + { + /* Ignore references from non-target functions in offload lto mode. */ + if (offload_lto_mode + && !lookup_attribute ("omp declare target", + DECL_ATTRIBUTES (e->caller->decl))) + continue; + + if (e->caller->in_other_partition + || !lto_symtab_encoder_in_partition_p (encoder, e->caller)) + return true; + } return false; } @@ -802,6 +819,18 @@ create_references (lto_symtab_encoder_t encoder, symtab_node *node) lto_symtab_encoder_encode (encoder, ref->referred); } +/* Select what needs to be streamed out. In regular lto mode stream everything. + In offload lto mode stream only stuff marked with an attribute. */ +void +select_what_to_dump (void) +{ + struct symtab_node *snode; + FOR_EACH_SYMBOL (snode) + snode->need_dump = !offload_lto_mode + || lookup_attribute ("omp declare target", + DECL_ATTRIBUTES (snode->decl)); +} + /* Find all symbols we want to stream into given partition and insert them to encoders. diff --git a/gcc/lto-section-names.h b/gcc/lto-section-names.h index cb75230..06d2caf 100644 --- a/gcc/lto-section-names.h +++ b/gcc/lto-section-names.h @@ -25,6 +25,11 @@ along with GCC; see the file COPYING3. If not see name for the functions and static_initializers. For other types of sections a '.' and the section type are appended. */ #define LTO_SECTION_NAME_PREFIX ".gnu.lto_" +#define OMP_SECTION_NAME_PREFIX ".gnu.target_lto_" + +/* Can be either OMP_SECTION_NAME_PREFIX when we stream 'pragma omp target' + stuff, or LTO_SECTION_NAME_PREFIX for LTO case. */ +extern const char *section_name_prefix; /* Segment name for LTO sections. This is only used for Mach-O. */ diff --git a/gcc/lto-streamer.c b/gcc/lto-streamer.c index 3480723..95232f9 100644 --- a/gcc/lto-streamer.c +++ b/gcc/lto-streamer.c @@ -48,6 +48,8 @@ struct lto_stats_d lto_stats; static bitmap_obstack lto_obstack; static bool lto_obstack_initialized; +bool offload_lto_mode = false; +const char *section_name_prefix = LTO_SECTION_NAME_PREFIX; /* Return a string representing LTO tag TAG. */ @@ -177,7 +179,7 @@ lto_get_section_name (int section_type, const char *name, struct lto_file_decl_d sprintf (post, "." HOST_WIDE_INT_PRINT_HEX_PURE, f->id); else sprintf (post, "." HOST_WIDE_INT_PRINT_HEX_PURE, get_random_seed (false)); - return concat (LTO_SECTION_NAME_PREFIX, sep, add, post, NULL); + return concat (section_name_prefix, sep, add, post, NULL); } diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h index 4bec969..0016eef 100644 --- a/gcc/lto-streamer.h +++ b/gcc/lto-streamer.h @@ -831,6 +831,7 @@ bool referenced_from_this_partition_p (symtab_node *, bool reachable_from_this_partition_p (struct cgraph_node *, lto_symtab_encoder_t); lto_symtab_encoder_t compute_ltrans_boundary (lto_symtab_encoder_t encoder); +void select_what_to_dump (void); /* In lto-symtab.c. */ @@ -846,6 +847,9 @@ extern void lto_write_options (void); /* Statistics gathered during LTO, WPA and LTRANS. */ extern struct lto_stats_d lto_stats; +/* Regular or offload mode of LTO. */ +extern bool offload_lto_mode; + /* Section names corresponding to the values of enum lto_section_type. */ extern const char *lto_section_name[]; diff --git a/gcc/lto/lto-object.c b/gcc/lto/lto-object.c index 323f7b2..4ee752f 100644 --- a/gcc/lto/lto-object.c +++ b/gcc/lto/lto-object.c @@ -230,8 +230,7 @@ lto_obj_add_section (void *data, const char *name, off_t offset, void **slot; struct lto_section_list *list = loasd->list; - if (strncmp (name, LTO_SECTION_NAME_PREFIX, - strlen (LTO_SECTION_NAME_PREFIX)) != 0) + if (strncmp (name, section_name_prefix, strlen (section_name_prefix))) return 1; new_name = xstrdup (name); diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c index 0451a66..332562f 100644 --- a/gcc/lto/lto-partition.c +++ b/gcc/lto/lto-partition.c @@ -134,6 +134,7 @@ add_symbol_to_partition_1 (ltrans_partition part, symtab_node *node) gcc_assert (c != SYMBOL_EXTERNAL && (c == SYMBOL_DUPLICATE || !symbol_partitioned_p (node))); + node->need_dump = true; lto_set_symtab_encoder_in_partition (part->encoder, node); if (symbol_partitioned_p (node)) @@ -920,6 +921,8 @@ lto_promote_cross_file_statics (void) gcc_assert (flag_wpa); + select_what_to_dump (); + /* First compute boundaries. */ n_sets = ltrans_partitions.length (); for (i = 0; i < n_sets; i++) diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c index 6cbb178..f23d997 100644 --- a/gcc/lto/lto.c +++ b/gcc/lto/lto.c @@ -2125,7 +2125,7 @@ lto_section_with_id (const char *name, unsigned HOST_WIDE_INT *id) { const char *s; - if (strncmp (name, LTO_SECTION_NAME_PREFIX, strlen (LTO_SECTION_NAME_PREFIX))) + if (strncmp (name, section_name_prefix, strlen (section_name_prefix))) return 0; s = strrchr (name, '.'); return s && sscanf (s, "." HOST_WIDE_INT_PRINT_HEX_PURE, id) == 1; @@ -2899,6 +2899,10 @@ read_cgraph_and_symbols (unsigned nfiles, const char **fnames) timevar_push (TV_IPA_LTO_DECL_IN); +#ifdef ACCEL_COMPILER + section_name_prefix = OMP_SECTION_NAME_PREFIX; +#endif + real_file_decl_data = decl_data = ggc_cleared_vec_alloc (nfiles + 1); real_file_count = nfiles; diff --git a/gcc/omp-low.c b/gcc/omp-low.c index 82651ea..7d587b3 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -257,6 +257,16 @@ is_parallel_ctx (omp_context *ctx) } +/* Return true if CTX is for an omp target region. */ + +static inline bool +is_targetreg_ctx (omp_context *ctx) +{ + return gimple_code (ctx->stmt) == GIMPLE_OMP_TARGET + && gimple_omp_target_kind (ctx->stmt) == GF_OMP_TARGET_KIND_REGION; +} + + /* Return true if CTX is for an omp task. */ static inline bool @@ -1930,9 +1940,7 @@ create_omp_child_function (omp_context *ctx, bool task_copy) { omp_context *octx; for (octx = ctx; octx; octx = octx->outer) - if (gimple_code (octx->stmt) == GIMPLE_OMP_TARGET - && gimple_omp_target_kind (octx->stmt) - == GF_OMP_TARGET_KIND_REGION) + if (is_targetreg_ctx (octx)) { target_p = true; break; @@ -2588,8 +2596,7 @@ check_omp_nesting_restrictions (gimple stmt, omp_context *ctx) break; case GIMPLE_OMP_TARGET: for (; ctx != NULL; ctx = ctx->outer) - if (gimple_code (ctx->stmt) == GIMPLE_OMP_TARGET - && gimple_omp_target_kind (ctx->stmt) == GF_OMP_TARGET_KIND_REGION) + if (is_targetreg_ctx (ctx)) { const char *name; switch (gimple_omp_target_kind (stmt)) @@ -8206,6 +8213,7 @@ expand_omp_target (struct omp_region *region) if (kind == GF_OMP_TARGET_KIND_REGION) { unsigned srcidx, dstidx, num; + struct cgraph_node *node; /* If the target region needs data sent from the parent function, then the very first statement (except possible @@ -8337,6 +8345,11 @@ expand_omp_target (struct omp_region *region) push_cfun (child_cfun); cgraph_edge::rebuild_edges (); + /* Prevent IPA from removing child_fn as unreachable, since there are no + refs from the parent function to the target side child_fn. */ + node = cgraph_node::get (child_fn); + node->mark_force_output (); + /* Some EH regions might become dead, see PR34608. If pass_cleanup_cfg isn't the first pass to happen with the new child, these dead EH edges might cause problems. @@ -9207,6 +9220,19 @@ lower_omp_critical (gimple_stmt_iterator *gsi_p, omp_context *ctx) DECL_COMMON (decl) = 1; DECL_ARTIFICIAL (decl) = 1; DECL_IGNORED_P (decl) = 1; + + /* If '#pragma omp critical' is inside target region, the symbol must + have an 'omp declare target' attribute. */ + omp_context *octx; + for (octx = ctx->outer; octx; octx = octx->outer) + if (is_targetreg_ctx (octx)) + { + DECL_ATTRIBUTES (decl) + = tree_cons (get_identifier ("omp declare target"), + NULL_TREE, DECL_ATTRIBUTES (decl)); + break; + } + varpool_node::finalize_decl (decl); splay_tree_insert (critical_name_mutexes, (splay_tree_key) name, diff --git a/gcc/passes.c b/gcc/passes.c index 5001c3d..d63c913 100644 --- a/gcc/passes.c +++ b/gcc/passes.c @@ -2308,6 +2308,8 @@ ipa_write_summaries (void) if (!flag_generate_lto || seen_error ()) return; + select_what_to_dump (); + encoder = lto_symtab_encoder_new (false); /* Create the callgraph set in the same order used in