From patchwork Fri Sep 27 17:15:49 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aldy Hernandez X-Patchwork-Id: 278652 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 453A42C0427 for ; Sat, 28 Sep 2013 03:16:30 +1000 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:references :in-reply-to:content-type; q=dns; s=default; b=E5DnXl3BG/MxRnKDY l5RCCRCIfdCyKrqqpmHhcbC0YHziJcXLmXrm5cxKl7RWo631NgysznXrvULplHdW p76GmH1kKkSpIQ/C7urdcrkm//IQWfuBH7uxbn45iIaYRKpoNLALgHbjHLnFKBuv 6Emin6jzgsyZ+AZyZvaZqHcCEY= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:references :in-reply-to:content-type; s=default; bh=nSxM05TH9B6sALvA+4VBWas om/0=; b=F6uSIkA9DR7KPCz0zCboS8Oy2hlMxEGAd5Dao2qEuE/j7Qyox1uGS/U MzeQxCk+fHMSla0bSHuXQwhrAG20UKJaAp2kyGD3pgB+Zh9FB4TwjYUn2n3idZKM R1wh5tAirHo0kmdLaEhbiRh1DyH3ulZDutVS8sFekZaNQjJCnWOI= Received: (qmail 15862 invoked by alias); 27 Sep 2013 17:16:23 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 15845 invoked by uid 89); 27 Sep 2013 17:16:22 -0000 Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 27 Sep 2013 17:16:22 +0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=1.1 required=5.0 tests=AWL, BAYES_50, RP_MATCHES_RCVD, SPAM_SUBJECT autolearn=no version=3.3.2 X-HELO: mx1.redhat.com Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r8RHFpWK011945 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 27 Sep 2013 13:15:51 -0400 Received: from houston.quesejoda.com (vpn-53-34.rdu2.redhat.com [10.10.53.34]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id r8RHFnmu021409; Fri, 27 Sep 2013 13:15:49 -0400 Message-ID: <5245BD45.6040604@redhat.com> Date: Fri, 27 Sep 2013 12:15:49 -0500 From: Aldy Hernandez User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130625 Thunderbird/17.0.7 MIME-Version: 1.0 To: Jakub Jelinek CC: Richard Biener , Jan Hubicka , Richard Henderson , gcc-patches , "Iyer, Balaji V" Subject: Re: OMP4/cilkplus: simd clone function mangling References: <52448B95.9020901@redhat.com> <20130927142317.GM30970@tucnak.zalov.cz> In-Reply-To: <20130927142317.GM30970@tucnak.zalov.cz> On 09/27/13 09:23, Jakub Jelinek wrote: > On Thu, Sep 26, 2013 at 02:31:33PM -0500, Aldy Hernandez wrote: >> --- a/gcc/config/i386/i386.c >> +++ b/gcc/config/i386/i386.c >> @@ -42806,6 +42806,43 @@ ix86_memmodel_check (unsigned HOST_WIDE_INT val) >> return val; >> } >> >> +/* Return the default vector mangling ISA code when none is specified >> + in a `processor' clause. */ >> + >> +static char >> +ix86_cilkplus_default_vector_mangling_isa_code (struct cgraph_node *clone >> + ATTRIBUTE_UNUSED) >> +{ >> + return 'x'; >> +} > > I think rth was suggesting using vecsize_mangle, vecsize_modifier or something else, > instead of ISA, because it won't represent the ISA on all targets. > It is just some magic letter used in mangling of the simd functions. I thought he was only talking about the local vecsize_mangle() function, not the target hooks. Fair enough, I have changed all the ISA references when they can be replaced with *mangle* or something similar. > >> + >> + /* To distinguish from an OpenMP simd clone, Cilk Plus functions to >> + be cloned have a distinctive artificial label in addition to "omp >> + declare simd". */ >> + bool cilk_clone = flag_enable_cilkplus >> + && lookup_attribute ("cilk plus elemental", >> + DECL_ATTRIBUTES (new_node->symbol.decl)); > > Formatting. I'd say it should be > bool cilk_clone > = (flag_enable_cilkplus > && lookup_attribute ("cilk plus elemental", > DECL_ATTRIBUTES (new_node->symbol.decl))); > >> + if (cilk_clone) >> + remove_attribute ("cilk plus elemental", >> + DECL_ATTRIBUTES (new_node->symbol.decl)); > > I think it doesn't make sense to remove the attribute. Done. > >> + pretty_printer vars_pp; > > Do you really need two different pretty printers? Whoops, fixed. Nice catch. > Can't you just print "_ZGV%c%c%d into pp (is pp_printf > that cheap, wouldn't it be better to pp_string (&pp, "_ZGV"), > 2 pp_character + one pp_decimal_int?), and then do the loop over > the args, which right now writes into vars_pp and finally > pp_underscore and pp_string the normally mangled name? pp_printf() would be cheap. It's only used for a few cloned functions in a compilation unit. I like printf. It's pretty and clean. Not using it, is like saving sex for your old age ;-). But just to keep you happy, I changed it...global maintainers are free to live their celibate monk lives as they see fit :). > >> +/* Create a simd clone of OLD_NODE and return it. */ >> + >> +static struct cgraph_node * >> +simd_clone_create (struct cgraph_node *old_node) >> +{ >> + struct cgraph_node *new_node; >> + new_node = cgraph_function_versioning (old_node, vNULL, NULL, NULL, false, >> + NULL, NULL, "simdclone"); >> + > > My understanding of how IPA cloning etc. works is that you first > set up various data structures describing how you change the arguments > and only then actually do cgraph_function_versioning which already during > the copying will do some of the transformations of the IL. > But perhaps those transformations are too complicated to describe for > tree-inline.c to make them for you. Sure, we can worry about that when we're actually emitting the actual clones (as discussed below), and when we start adapting the vectorizer. > >> + tree attr = lookup_attribute ("omp declare simd", >> + DECL_ATTRIBUTES (node->symbol.decl)); >> + if (!attr) >> + return; >> + do >> + { >> + struct cgraph_node *new_node = simd_clone_create (node); >> + >> + bool inbranch_clause; >> + simd_clone_clauses_extract (new_node, TREE_VALUE (attr), >> + &inbranch_clause); >> + simd_clone_compute_isa_and_simdlen (new_node); >> + simd_clone_mangle (node, new_node); > > As discussed on IRC, I was hoping that for OpenMP simd and selected > targets (e.g. i?86-linux and x86_64-linux) we could do better than that, > creating not just one or two clones as we do for Cilk+ where one can > select which CPU (and thus ISA) he wants to build the clones for, but > creating clones for all ISAs, and just based on command line options > either emit just one of them as the really optimized one and the others > just as thunks that would just call other simd clone functions or the > normal function possibly several times. The thunk sounds like a good idea, long term. How about we start by emitting all the variants up-front and then we can optimize these cases later? I was thinking, in the absence of a `simdlen' clause, we can provide a target hook that returns a vector of (struct { int hw_vector_size; char vecsize_mangle }) which would gives us the different clone variants we should generate. If the user provides `simdlen', we can continue generating just one clone (or two with *inbranch) with the present generic algorithm in simd_clone_compute_vecsize_and_simdlen(). Or do you have any other ideas? But I'd like to leave the thunking business after we get the general infrastructure working. Aldy diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 1a12eda..c3a70b6 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,44 @@ +2013-09-24 Aldy Hernandez + + * Makefile.in (omp-low.o): Depend on PRETTY_PRINT_H. + * ipa-cp.c (determine_versionability): Nodes with SIMD clones are + not versionable. + * ggc.h (ggc_alloc_cleared_simd_clone_stat): New. + * cgraph.h (enum linear_stride_type): New. + (struct simd_clone_arg): New. + (struct simd_clone): New. + (struct cgraph_node): Add `simdclone' field. + Add `has_simd_clones' field. + * omp-low.c: Add new pass_omp_simd_clone support code. + (vecsize_mangle): New. + (ipa_omp_simd_clone): New. + (simd_clone_clauses_extract): New. + (simd_clone_compute_base_data_type): New. + (simd_clone_compute_vecsize_and_simdlen): New. + (simd_clone_create): New. + (simd_clone_mangle): New. + (simd_clone_struct_allow): New. + (simd_clone_struct_copy): New. + (class argno_map): New. + (argno_map::argno_map(tree)): New. + (argno_map::~argno_map): New. + (argno_map::to_tree): New. + * tree.h (OMP_CLAUSE_LINEAR_VARIABLE_STRIDE): New. + * tree-core.h (OMP_CLAUSE_LINEAR_VARIABLE_STRIDE): Document. + * tree-pass.h (make_pass_omp_simd_clone): New. + * passes.def (pass_omp_simd_clone): New. + * target.def: Define new hook prefix "TARGET_CILKPLUS_". + (default_vecsize_mangle): New. + (max_vecsize_for_mangle): New. + * doc/tm.texi.in: Add placeholder for + TARGET_CILKPLUS_DEFAULT_VECSIZE_MANGLE and + TARGET_CILKPLUS_VECSIZE_FOR_MANGLE. + * doc/tm.texi: Regenerate. + * config/i386/i386.c (ix86_cilkplus_default_vecsize_mangle): New. + (ix86_cilkplus_vecsize_for_mangle): New. + (TARGET_CILKPLUS_DEFAULT_VECSIZE_MANGLE): New. + (TARGET_CILKPLUS_VECSIZE_FOR_MANGLE): New. + 2013-09-19 Jakub Jelinek PR tree-optimization/58472 diff --git a/gcc/Makefile.in b/gcc/Makefile.in index c006711..4fc7e48 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -2573,6 +2573,7 @@ omp-low.o : omp-low.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(TREE_H) \ $(RTL_H) $(GIMPLE_H) $(TREE_INLINE_H) langhooks.h $(DIAGNOSTIC_CORE_H) \ $(TREE_SSA_H) $(FLAGS_H) $(EXPR_H) $(DIAGNOSTIC_CORE_H) \ $(TREE_PASS_H) $(GGC_H) $(EXCEPT_H) $(SPLAY_TREE_H) $(OPTABS_H) \ + $(PRETTY_PRINT_H) \ $(CFGLOOP_H) tree-iterator.h $(TARGET_H) gt-omp-low.h tree-browser.o : tree-browser.c tree-browser.def $(CONFIG_H) $(SYSTEM_H) \ coretypes.h $(HASH_TABLE_H) $(TREE_H) $(TREE_PRETTY_PRINT_H) diff --git a/gcc/cgraph.h b/gcc/cgraph.h index 50e8743..0552805 100644 --- a/gcc/cgraph.h +++ b/gcc/cgraph.h @@ -248,6 +248,70 @@ struct GTY(()) cgraph_clone_info bitmap combined_args_to_skip; }; +enum linear_stride_type { + LINEAR_STRIDE_NO, + LINEAR_STRIDE_YES_CONSTANT, + LINEAR_STRIDE_YES_VARIABLE +}; + +/* Function arguments in the original function of a SIMD clone. + Supplementary data for `struct simd_clone'. */ + +struct GTY(()) simd_clone_arg { + /* A SIMD clone's argument can be either linear (constant or + variable), uniform, or vector. If the argument is neither linear + or uniform, the default is vector. */ + + /* If the linear stride is a constant, `linear_stride' is + LINEAR_STRIDE_YES_CONSTANT, and `linear_stride_num' holds + the numeric stride. + + If the linear stride is variable, `linear_stride' is + LINEAR_STRIDE_YES_VARIABLE, and `linear_stride_num' contains + the function argument containing the stride (as an index into the + function arguments starting at 0). + + Otherwise, `linear_stride' is LINEAR_STRIDE_NO and + `linear_stride_num' is unused. */ + enum linear_stride_type linear_stride; + unsigned HOST_WIDE_INT linear_stride_num; + + /* Variable alignment if available, otherwise 0. */ + unsigned int alignment; + + /* True if variable is uniform. */ + unsigned int uniform : 1; +}; + +/* Specific data for a SIMD function clone. */ + +struct GTY(()) simd_clone { + /* Number of words in the SIMD lane associated with this clone. */ + unsigned int simdlen; + + /* Number of annotated function arguments in `args'. This is + usually the number of named arguments in FNDECL. */ + unsigned int nargs; + + /* Max hardware vector size in bits. */ + unsigned int hw_vector_size; + + /* The mangling character for a given vector size. This is is used + to determine the ISA mangling bit as specified in the Intel + Vector ABI. */ + unsigned char vecsize_mangle; + + /* True if this is the masked, in-branch version of the clone, + otherwise false. */ + unsigned int inbranch : 1; + + /* True if this is a Cilk Plus variant. */ + unsigned int cilk_elemental : 1; + + /* Annotated function arguments for the original function. */ + struct simd_clone_arg GTY((length ("%h.nargs"))) args[1]; +}; + /* The cgraph data structure. Each function decl has assigned cgraph_node listing callees and callers. */ @@ -282,6 +346,10 @@ struct GTY(()) cgraph_node { /* Declaration node used to be clone of. */ tree former_clone_of; + /* If this is a SIMD clone, this points to the SIMD specific + information for it. */ + struct simd_clone *simdclone; + /* Interprocedural passes scheduled to have their transform functions applied next time we execute local pass on them. We maintain it per-function in order to allow IPA passes to introduce new functions. */ @@ -323,6 +391,8 @@ struct GTY(()) cgraph_node { /* ?? We should be able to remove this. We have enough bits in cgraph to calculate it. */ unsigned tm_clone : 1; + /* True if this function has SIMD clones. */ + unsigned has_simd_clones : 1; /* True if this decl is a dispatcher for function versions. */ unsigned dispatcher_function : 1; }; diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 46c37d8..726ee71 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -42806,6 +42806,42 @@ ix86_memmodel_check (unsigned HOST_WIDE_INT val) return val; } +/* Return the default mangling character when no vector size can be + determined from the `processor' clause. */ + +static char +ix86_cilkplus_default_vecsize_mangle (struct cgraph_node *clone + ATTRIBUTE_UNUSED) +{ + return 'x'; +} + +/* Return the hardware vector size (in bits) for a mangling + character. */ + +static unsigned int +ix86_cilkplus_vecsize_for_mangle (char mangle) +{ + /* ?? Intel currently has no ISA encoding character for AVX-512. */ + switch (mangle) + { + case 'x': + /* xmm (SSE2). */ + return 128; + case 'y': + /* ymm1 (AVX1). */ + case 'Y': + /* ymm2 (AVX2). */ + return 256; + case 'z': + /* zmm (MIC). */ + return 512; + default: + gcc_unreachable (); + return 0; + } +} + /* Initialize the GCC target structure. */ #undef TARGET_RETURN_IN_MEMORY #define TARGET_RETURN_IN_MEMORY ix86_return_in_memory @@ -43178,6 +43214,14 @@ ix86_memmodel_check (unsigned HOST_WIDE_INT val) #undef TARGET_SPILL_CLASS #define TARGET_SPILL_CLASS ix86_spill_class +#undef TARGET_CILKPLUS_DEFAULT_VECSIZE_MANGLE +#define TARGET_CILKPLUS_DEFAULT_VECSIZE_MANGLE \ + ix86_cilkplus_default_vecsize_mangle + +#undef TARGET_CILKPLUS_VECSIZE_FOR_MANGLE +#define TARGET_CILKPLUS_VECSIZE_FOR_MANGLE \ + ix86_cilkplus_vecsize_for_mangle + struct gcc_target targetm = TARGET_INITIALIZER; #include "gt-i386.h" diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 8d220f3..8bb9d1e 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -5787,6 +5787,26 @@ The default is @code{NULL_TREE} which means to not vectorize gather loads. @end deftypefn +@deftypefn {Target Hook} char TARGET_CILKPLUS_DEFAULT_VECSIZE_MANGLE (struct cgraph_node *@var{}) +This hook should return the default mangling character when no vector +size can be determined by examining the Cilk Plus @code{processor} clause. +This is as specified in the Intel Vector ABI document. + +This hook, as well as @code{max_vector_size_for_isa} below must be set +to support the Cilk Plus @code{processor} clause. + +The only argument is a @var{cgraph_node} containing the clone. +@end deftypefn + +@deftypefn {Target Hook} {unsigned int} TARGET_CILKPLUS_VECSIZE_FOR_MANGLE (char) +This hook returns the maximum hardware vector size in bits for a given +mangling character. The character is as described in Intel's +Vector ABI (see @var{ISA} character in the section on mangling). + +This hook must be defined in order to support the Cilk Plus @code{processor} +clause. +@end deftypefn + @node Anchored Addresses @section Anchored Addresses @cindex anchored addresses diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 863e843a..db25787 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -4414,6 +4414,10 @@ address; but often a machine-dependent strategy can generate better code. @hook TARGET_VECTORIZE_BUILTIN_GATHER +@hook TARGET_CILKPLUS_DEFAULT_VECSIZE_MANGLE + +@hook TARGET_CILKPLUS_VECSIZE_FOR_MANGLE + @node Anchored Addresses @section Anchored Addresses @cindex anchored addresses diff --git a/gcc/ggc.h b/gcc/ggc.h index b31bc80..eee90c6 100644 --- a/gcc/ggc.h +++ b/gcc/ggc.h @@ -276,4 +276,11 @@ ggc_alloc_cleared_gimple_statement_d_stat (size_t s MEM_STAT_DECL) ggc_internal_cleared_alloc_stat (s PASS_MEM_STAT); } +static inline struct simd_clone * +ggc_alloc_cleared_simd_clone_stat (size_t s MEM_STAT_DECL) +{ + return (struct simd_clone *) + ggc_internal_cleared_alloc_stat (s PASS_MEM_STAT); +} + #endif diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c index 56b27b2..a04ee90 100644 --- a/gcc/ipa-cp.c +++ b/gcc/ipa-cp.c @@ -446,6 +446,13 @@ determine_versionability (struct cgraph_node *node) reason = "not a tree_versionable_function"; else if (cgraph_function_body_availability (node) <= AVAIL_OVERWRITABLE) reason = "insufficient body availability"; + else if (node->has_simd_clones) + { + /* Ideally we should clone the SIMD clones themselves and create + vector copies of them, so IPA-cp and SIMD clones can happily + coexist, but that may not be worth the effort. */ + reason = "function has SIMD clones"; + } if (reason && dump_file && !node->symbol.alias && !node->thunk.thunk_p) fprintf (dump_file, "Function %s/%i is not versionable, reason: %s.\n", diff --git a/gcc/omp-low.c b/gcc/omp-low.c index 2d7898f..3eeafc3 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -43,6 +43,7 @@ along with GCC; see the file COPYING3. If not see #include "optabs.h" #include "cfgloop.h" #include "target.h" +#include "pretty-print.h" /* Lowering of OpenMP parallel and workshare constructs proceeds in two @@ -10287,5 +10288,450 @@ make_pass_diagnose_omp_blocks (gcc::context *ctxt) { return new pass_diagnose_omp_blocks (ctxt); } + +/* SIMD clone supporting code. */ + +/* A map for function arguments. This will map a zero-based integer + to the corresponding index into DECL_ARGUMENTS. */ +class argno_map +{ + vec tree_args; + public: + /* Default constructor declared but not implemented by design. The + only valid constructor is TREE version below. */ + argno_map (); + argno_map (tree fndecl); + + ~argno_map () { tree_args.release (); } + tree to_tree (int n); +}; + +/* FNDECL is the function containing the arguments. */ + +argno_map::argno_map (tree fndecl) +{ + tree_args.create (5); + for (tree t = DECL_ARGUMENTS (fndecl); t; t = DECL_CHAIN (t)) + tree_args.safe_push (t); +} + +/* Return the DECL corresponding to the zero-based integer index into + the function arguments. */ + +tree +argno_map::to_tree (int n) +{ + return tree_args[n]; +} + +/* Allocate a fresh `simd_clone' and return it. NARGS is the number + of arguments to reserve space for. */ + +static struct simd_clone * +simd_clone_struct_alloc (int nargs) +{ + struct simd_clone *clone_info; + int len = sizeof (struct simd_clone) + + nargs * sizeof (struct simd_clone_arg); + clone_info = ggc_alloc_cleared_simd_clone_stat (len PASS_MEM_STAT); + return clone_info; +} + +/* Make a copy of the `struct simd_clone' in FROM to TO. */ + +static inline void +simd_clone_struct_copy (struct simd_clone *to, struct simd_clone *from) +{ + memcpy (to, from, sizeof (struct simd_clone) + + from->nargs * sizeof (struct simd_clone_arg)); +} + +/* Given a simd clone in NEW_NODE, extract the simd specific + information from the OMP clauses passed in CLAUSES, and set the + relevant bits in the cgraph node. *INBRANCH_SPECIFIED is set to + TRUE if the `inbranch' or `notinbranch' clause specified, otherwise + set to FALSE. */ + +static void +simd_clone_clauses_extract (struct cgraph_node *new_node, tree clauses, + bool *inbranch_specified) +{ + tree t; + int n = 0; + *inbranch_specified = false; + for (t = DECL_ARGUMENTS (new_node->symbol.decl); t; t = DECL_CHAIN (t)) + ++n; + + /* To distinguish from an OpenMP simd clone, Cilk Plus functions to + be cloned have a distinctive artificial label in addition to "omp + declare simd". */ + bool cilk_clone + = (flag_enable_cilkplus + && lookup_attribute ("cilk plus elemental", + DECL_ATTRIBUTES (new_node->symbol.decl))); + + struct simd_clone *clone_info = simd_clone_struct_alloc (n); + clone_info->nargs = n; + clone_info->cilk_elemental = cilk_clone; + gcc_assert (!new_node->simdclone); + new_node->simdclone = clone_info; + + if (!clauses || TREE_CODE (clauses) != OMP_CLAUSE) + return; + + for (t = clauses; t; t = OMP_CLAUSE_CHAIN (t)) + { + switch (OMP_CLAUSE_CODE (t)) + { + case OMP_CLAUSE_INBRANCH: + clone_info->inbranch = 1; + *inbranch_specified = true; + break; + case OMP_CLAUSE_NOTINBRANCH: + clone_info->inbranch = 0; + *inbranch_specified = true; + break; + case OMP_CLAUSE_SIMDLEN: + clone_info->simdlen + = TREE_INT_CST_LOW (OMP_CLAUSE_SIMDLEN_EXPR (t)); + break; + case OMP_CLAUSE_LINEAR: + { + tree decl = OMP_CLAUSE_DECL (t); + tree step = OMP_CLAUSE_LINEAR_STEP (t); + int argno = TREE_INT_CST_LOW (decl); + if (OMP_CLAUSE_LINEAR_VARIABLE_STRIDE (t)) + { + clone_info->args[argno].linear_stride + = LINEAR_STRIDE_YES_VARIABLE; + clone_info->args[argno].linear_stride_num + = TREE_INT_CST_LOW (step); + gcc_assert (!TREE_INT_CST_HIGH (step)); + } + else + { + if (TREE_INT_CST_HIGH (step)) + { + /* It looks like this can't really happen, since the + front-ends generally issue: + + warning: integer constant is too large for its type. + + But let's assume somehow we got past all that. */ + warning_at (DECL_SOURCE_LOCATION (decl), 0, + "ignoring large linear step"); + } + else + { + clone_info->args[argno].linear_stride + = LINEAR_STRIDE_YES_CONSTANT; + clone_info->args[argno].linear_stride_num + = TREE_INT_CST_LOW (step); + } + } + break; + } + case OMP_CLAUSE_UNIFORM: + { + tree decl = OMP_CLAUSE_DECL (t); + int argno = tree_low_cst (decl, 1); + clone_info->args[argno].uniform = 1; + break; + } + case OMP_CLAUSE_ALIGNED: + { + tree decl = OMP_CLAUSE_DECL (t); + int argno = tree_low_cst (decl, 1); + clone_info->args[argno].alignment + = TREE_INT_CST_LOW (OMP_CLAUSE_ALIGNED_ALIGNMENT (t)); + break; + } + default: + break; + } + } +} + +/* Helper function for mangling vectors. Given a vector size in bits, + return the corresponding mangling character. */ + +static char +vecsize_mangle (unsigned int vecsize) +{ + switch (vecsize) + { + /* The Intel Vector ABI does not provide a mangling character + for a 64-bit ISA, but this feels like it's keeping with the + design. */ + case 64: return 'w'; + + case 128: return 'x'; + case 256: return 'y'; + case 512: return 'z'; + default: + /* FIXME: We must come up with a default mangling bit. */ + return 'x'; + } +} + +/* Given a SIMD clone in NEW_NODE, calculate the characteristic data + type and return the coresponding type. The characteristic data + type is computed as described in the Intel Vector ABI. */ + +static tree +simd_clone_compute_base_data_type (struct cgraph_node *new_node) +{ + tree type = integer_type_node; + tree fndecl = new_node->symbol.decl; + + /* a) For non-void function, the characteristic data type is the + return type. */ + if (TREE_CODE (TREE_TYPE (TREE_TYPE (fndecl))) != VOID_TYPE) + type = TREE_TYPE (TREE_TYPE (fndecl)); + + /* b) If the function has any non-uniform, non-linear parameters, + then the characteristic data type is the type of the first + such parameter. */ + else + { + argno_map map (fndecl); + for (unsigned int i = 0; i < new_node->simdclone->nargs; ++i) + { + struct simd_clone_arg arg = new_node->simdclone->args[i]; + if (!arg.uniform && arg.linear_stride == LINEAR_STRIDE_NO) + { + type = TREE_TYPE (map.to_tree (i)); + break; + } + } + } + + /* c) If the characteristic data type determined by a) or b) above + is struct, union, or class type which is pass-by-value (except + for the type that maps to the built-in complex data type), the + characteristic data type is int. */ + if (RECORD_OR_UNION_TYPE_P (type) + && !aggregate_value_p (type, NULL) + && TREE_CODE (type) != COMPLEX_TYPE) + return integer_type_node; + + /* d) If none of the above three classes is applicable, the + characteristic data type is int. */ + + return type; + + /* e) For Intel Xeon Phi native and offload compilation, if the + resulting characteristic data type is 8-bit or 16-bit integer + data type, the characteristic data type is int. */ + /* Well, we don't handle Xeon Phi yet. */ +} + +/* Given a SIMD clone in NEW_NODE, compute simdlen and vector size, + and store them in NEW_NODE->simdclone. */ + +static void +simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *new_node) +{ + char vmangle = new_node->simdclone->vecsize_mangle; + /* Vector size for this clone. */ + unsigned int vecsize = 0; + /* Base vector type, based on function arguments. */ + tree base_type = simd_clone_compute_base_data_type (new_node); + unsigned int base_type_size = GET_MODE_BITSIZE (TYPE_MODE (base_type)); + + /* Calculate everything for Cilk Plus clones with appropriate target + support. This is as specified in the Intel Vector ABI. + + Note: Any target which supports the Cilk Plus processor clause + must also provide appropriate target hooks for calculating + default ISA/processor (default_vecsize_mangle), and for + calculating hardware vector size based on ISA/processor + (vecsize_for_mangle). */ + if (new_node->simdclone->cilk_elemental + && targetm.cilkplus.default_vecsize_mangle) + { + if (!vmangle) + vmangle = targetm.cilkplus.default_vecsize_mangle (new_node); + vecsize = targetm.cilkplus.vecsize_for_mangle (vmangle); + if (!new_node->simdclone->simdlen) + new_node->simdclone->simdlen = vecsize / base_type_size; + } + /* Calculate everything else generically. */ + else + { + vecsize = GET_MODE_BITSIZE (targetm.vectorize.preferred_simd_mode + (TYPE_MODE (base_type))); + vmangle = vecsize_mangle (vecsize); + if (!new_node->simdclone->simdlen) + new_node->simdclone->simdlen = vecsize / base_type_size; + } + new_node->simdclone->vecsize_mangle = vmangle; + new_node->simdclone->hw_vector_size = vecsize; +} + +static void +simd_clone_mangle (struct cgraph_node *old_node, struct cgraph_node *new_node) +{ + char vecsize_mangle = new_node->simdclone->vecsize_mangle; + char mask = new_node->simdclone->inbranch ? 'M' : 'N'; + unsigned int simdlen = new_node->simdclone->simdlen; + unsigned int n; + pretty_printer pp; + + gcc_assert (vecsize_mangle && simdlen); + + pp_string (&pp, "_ZGV"); + pp_character (&pp, vecsize_mangle); + pp_character (&pp, mask); + pp_decimal_int (&pp, simdlen); + + for (n = 0; n < new_node->simdclone->nargs; ++n) + { + struct simd_clone_arg arg = new_node->simdclone->args[n]; + + if (arg.uniform) + pp_character (&pp, 'u'); + else if (arg.linear_stride == LINEAR_STRIDE_YES_CONSTANT) + { + gcc_assert (arg.linear_stride_num != 0); + pp_character (&pp, 'l'); + if (arg.linear_stride_num > 1) + pp_unsigned_wide_integer (&pp, + arg.linear_stride_num); + } + else if (arg.linear_stride == LINEAR_STRIDE_YES_VARIABLE) + { + pp_character (&pp, 's'); + pp_unsigned_wide_integer (&pp, arg.linear_stride_num); + } + else + pp_character (&pp, 'v'); + if (arg.alignment) + { + pp_character (&pp, 'a'); + pp_decimal_int (&pp, arg.alignment); + } + } + + pp_underscore (&pp); + pp_string (&pp, + IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (old_node->symbol.decl))); + const char *str = pp_formatted_text (&pp); + change_decl_assembler_name (new_node->symbol.decl, + get_identifier (str)); +} + +/* Create a simd clone of OLD_NODE and return it. */ + +static struct cgraph_node * +simd_clone_create (struct cgraph_node *old_node) +{ + struct cgraph_node *new_node; + new_node = cgraph_function_versioning (old_node, vNULL, NULL, NULL, false, + NULL, NULL, "simdclone"); + + /* Keep cgraph friends from removing the clone. */ + new_node->symbol.externally_visible + = old_node->symbol.externally_visible; + TREE_PUBLIC (new_node->symbol.decl) = TREE_PUBLIC (old_node->symbol.decl); + old_node->has_simd_clones = true; + + DECL_ATTRIBUTES (new_node->symbol.decl) + = remove_attribute ("omp declare simd", + DECL_ATTRIBUTES (new_node->symbol.decl)); + + return new_node; +} + +/* If the function in NODE is tagged as an elemental SIMD function, + create the appropriate SIMD clones. */ + +static void +expand_simd_clones (struct cgraph_node *node) +{ + if (cgraph_function_body_availability (node) < AVAIL_OVERWRITABLE) + return; + + tree attr = lookup_attribute ("omp declare simd", + DECL_ATTRIBUTES (node->symbol.decl)); + if (!attr) + return; + do + { + struct cgraph_node *new_node = simd_clone_create (node); + + bool inbranch_clause; + simd_clone_clauses_extract (new_node, TREE_VALUE (attr), + &inbranch_clause); + simd_clone_compute_vecsize_and_simdlen (new_node); + simd_clone_mangle (node, new_node); + + // FIXME: Adjust clone parameters to their appropriate vector types. + + /* If no inbranch clause was specified, we need both variants. + We have already created the not-in-branch version above, by + virtue of .inbranch being clear. Create the masked in-branch + version. */ + if (!inbranch_clause) + { + struct cgraph_node *n = simd_clone_create (node); + struct simd_clone *clone + = simd_clone_struct_alloc (new_node->simdclone->nargs); + simd_clone_struct_copy (clone, new_node->simdclone); + clone->inbranch = 1; + n->simdclone = clone; + simd_clone_mangle (node, n); + } + } + while ((attr = lookup_attribute ("omp declare simd", TREE_CHAIN (attr)))); +} + +/* Entry point for IPA simd clone creation pass. */ + +static unsigned int +ipa_omp_simd_clone (void) +{ + struct cgraph_node *node; + FOR_EACH_DEFINED_FUNCTION (node) + expand_simd_clones (node); + return 0; +} + +namespace { + +const pass_data pass_data_omp_simd_clone = +{ + SIMPLE_IPA_PASS, /* type */ + "simdclone", /* name */ + OPTGROUP_NONE, /* optinfo_flags */ + true, /* has_gate */ + true, /* has_execute */ + TV_NONE, /* tv_id */ + ( PROP_ssa | PROP_cfg ), /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + 0, /* todo_flags_finish */ +}; + +class pass_omp_simd_clone : public simple_ipa_opt_pass +{ +public: + pass_omp_simd_clone(gcc::context *ctxt) + : simple_ipa_opt_pass(pass_data_omp_simd_clone, ctxt) + {} + + /* opt_pass methods: */ + bool gate () { return flag_openmp || flag_enable_cilkplus; } + unsigned int execute () { return ipa_omp_simd_clone (); } +}; + +} // anon namespace + +simple_ipa_opt_pass * +make_pass_omp_simd_clone (gcc::context *ctxt) +{ + return new pass_omp_simd_clone (ctxt); +} #include "gt-omp-low.h" diff --git a/gcc/passes.def b/gcc/passes.def index 84eb3f3..6803399 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -97,6 +97,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_feedback_split_functions); POP_INSERT_PASSES () NEXT_PASS (pass_ipa_increase_alignment); + NEXT_PASS (pass_omp_simd_clone); NEXT_PASS (pass_ipa_tm); NEXT_PASS (pass_ipa_lower_emutls); TERMINATE_PASS_LIST () diff --git a/gcc/target.def b/gcc/target.def index 6de513f..92cbd73 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -1508,6 +1508,35 @@ hook_int_uint_mode_1) HOOK_VECTOR_END (sched) +/* Functions relating to Cilk Plus. */ +#undef HOOK_PREFIX +#define HOOK_PREFIX "TARGET_CILKPLUS_" +HOOK_VECTOR (TARGET_CILKPLUS, cilkplus) + +DEFHOOK +(default_vecsize_mangle, +"This hook should return the default mangling character when no vector\n\ +size can be determined by examining the Cilk Plus @code{processor} clause.\n\ +This is as specified in the Intel Vector ABI document.\n\ +\n\ +This hook, as well as @code{max_vector_size_for_isa} below must be set\n\ +to support the Cilk Plus @code{processor} clause.\n\ +\n\ +The only argument is a @var{cgraph_node} containing the clone.", +char, (struct cgraph_node *), NULL) + +DEFHOOK +(vecsize_for_mangle, +"This hook returns the maximum hardware vector size in bits for a given\n\ +mangling character. The character is as described in Intel's\n\ +Vector ABI (see @var{ISA} character in the section on mangling).\n\ +\n\ +This hook must be defined in order to support the Cilk Plus @code{processor}\n\ +clause.", +unsigned int, (char), NULL) + +HOOK_VECTOR_END (cilkplus) + /* Functions relating to vectorization. */ #undef HOOK_PREFIX #define HOOK_PREFIX "TARGET_VECTORIZE_" diff --git a/gcc/testsuite/gcc.dg/gomp/simd-clones-1.c b/gcc/testsuite/gcc.dg/gomp/simd-clones-1.c new file mode 100644 index 0000000..486b67a --- /dev/null +++ b/gcc/testsuite/gcc.dg/gomp/simd-clones-1.c @@ -0,0 +1,33 @@ +/* { dg-do compile } */ +/* { dg-options "-fopenmp -fdump-tree-optimized -O3" } */ + +/* Test that functions that have SIMD clone counterparts are not + cloned by IPA-cp. For example, special_add() below has SIMD clones + created for it. However, if IPA-cp later decides to clone a + specialization of special_add(x, 666) when analyzing fillit(), we + will forever keep the vectorizer from using the SIMD versions of + special_add in a loop. + + If IPA-CP gets taught how to adjust the SIMD clones as well, this + test could be removed. */ + +#pragma omp declare simd simdlen(4) +static int __attribute__ ((noinline)) +special_add (int x, int y) +{ + if (y == 666) + return x + y + 123; + else + return x + y; +} + +void fillit(int *tot) +{ + int i; + + for (i=0; i < 10000; ++i) + tot[i] = special_add (i, 666); +} + +/* { dg-final { scan-tree-dump-not "special_add.constprop" "optimized" } } */ +/* { dg-final { cleanup-tree-dump "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/gomp/simd-clones-2.c b/gcc/testsuite/gcc.dg/gomp/simd-clones-2.c new file mode 100644 index 0000000..8ab3131 --- /dev/null +++ b/gcc/testsuite/gcc.dg/gomp/simd-clones-2.c @@ -0,0 +1,21 @@ +/* { dg-do compile { target i?86-*-* x86_64-*-* } } */ +/* { dg-options "-fopenmp -fdump-tree-optimized -O -msse2" } */ + +#pragma omp declare simd inbranch uniform(c) linear(b:66) // addit.simdclone.2 +#pragma omp declare simd notinbranch aligned(c:32) // addit.simdclone.1 +int addit(int a, int b, int c) +{ + return a + b; +} + +#pragma omp declare simd uniform(a) aligned(a:32) linear(k:1) notinbranch +float setArray(float *a, float x, int k) +{ + a[k] = a[k] + x; + return a[k]; +} + +/* { dg-final { scan-tree-dump "clone.0 \\(_ZGVxN4ua32vl_setArray" "optimized" } } */ +/* { dg-final { scan-tree-dump "clone.1 \\(_ZGVxN4vvva32_addit" "optimized" } } */ +/* { dg-final { scan-tree-dump "clone.2 \\(_ZGVxM4vl66u_addit" "optimized" } } */ +/* { dg-final { cleanup-tree-dump "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/gomp/simd-clones-3.c b/gcc/testsuite/gcc.dg/gomp/simd-clones-3.c new file mode 100644 index 0000000..1ce9692 --- /dev/null +++ b/gcc/testsuite/gcc.dg/gomp/simd-clones-3.c @@ -0,0 +1,15 @@ +/* { dg-do compile { target i?86-*-* x86_64-*-* } } */ +/* { dg-options "-fopenmp -fdump-tree-optimized -O -msse2" } */ + +/* Test that if there is no *inbranch clauses, that both the masked and + the unmasked version are created. */ + +#pragma omp declare simd +int addit(int a, int b, int c) +{ + return a + b; +} + +/* { dg-final { scan-tree-dump "clone.* \\(_ZGVxN4vvv_addit" "optimized" } } */ +/* { dg-final { scan-tree-dump "clone.* \\(_ZGVxM4vvv_addit" "optimized" } } */ +/* { dg-final { cleanup-tree-dump "optimized" } } */ diff --git a/gcc/tree-core.h b/gcc/tree-core.h index 4a0d437..c436b72 100644 --- a/gcc/tree-core.h +++ b/gcc/tree-core.h @@ -885,6 +885,9 @@ struct GTY(()) tree_base { CALL_ALLOCA_FOR_VAR_P in CALL_EXPR + OMP_CLAUSE_LINEAR_VARIABLE_STRIDE in + OMP_CLAUSE_LINEAR + side_effects_flag: TREE_SIDE_EFFECTS in diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h index ea1a62f..718f259 100644 --- a/gcc/tree-pass.h +++ b/gcc/tree-pass.h @@ -474,6 +474,7 @@ extern ipa_opt_pass_d *make_pass_ipa_pure_const (gcc::context *ctxt); extern simple_ipa_opt_pass *make_pass_ipa_pta (gcc::context *ctxt); extern ipa_opt_pass_d *make_pass_ipa_lto_finish_out (gcc::context *ctxt); extern simple_ipa_opt_pass *make_pass_ipa_tm (gcc::context *ctxt); +extern simple_ipa_opt_pass *make_pass_omp_simd_clone (gcc::context *ctxt); extern ipa_opt_pass_d *make_pass_ipa_profile (gcc::context *ctxt); extern ipa_opt_pass_d *make_pass_ipa_cdtor_merge (gcc::context *ctxt); diff --git a/gcc/tree.h b/gcc/tree.h index b13cb2b..3e9818c 100644 --- a/gcc/tree.h +++ b/gcc/tree.h @@ -1318,6 +1318,10 @@ extern void protected_set_expr_location (tree, location_t); #define OMP_CLAUSE_LINEAR_NO_COPYOUT(NODE) \ TREE_PRIVATE (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_LINEAR)) +/* True if a LINEAR clause has a stride that is variable. */ +#define OMP_CLAUSE_LINEAR_VARIABLE_STRIDE(NODE) \ + TREE_PROTECTED (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_LINEAR)) + #define OMP_CLAUSE_LINEAR_STEP(NODE) \ OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_LINEAR), 1)