From patchwork Fri Nov 22 00:34:37 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Jelinek X-Patchwork-Id: 293281 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 61C012C00B9 for ; Fri, 22 Nov 2013 11:35:12 +1100 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:reply-to:mime-version :content-type; q=dns; s=default; b=Fbe7TDKNuGllM3UstcvQpLxdp3k3o t3cJNCl4E3zkNu1cTdCWxgSOcCDLi/2snm8MuDmFEBYIlac0+dgM9jTWxllUxIFs aq1ooGcP8slgwE5n5lkK4UKzuU30O3lWmTcFXTVWKSPJn/0LGIJvRG8BTUCXwF2K xj4ICiV26IxiGs= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:reply-to:mime-version :content-type; s=default; bh=wRSvmJ8O8qB9zNeIGKt7EkePT9I=; b=SGe VbXPEwzlr84tDH5zymRgG+dPsnghxPto5MFFFKv4r9TTw+6hRbwaP65XrtPTdlGm OIxQIF9RixRgMdIHuxE1KPmg/Dc4eW7AY0EP9pHA4OaTrZ0PycdjDoN4w7vKt03T xnjkemT+2vY26lpx8JFBqakbuTFQFhlLmZJP6MAg= Received: (qmail 18375 invoked by alias); 22 Nov 2013 00:35:00 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 18362 invoked by uid 89); 22 Nov 2013 00:34:58 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.8 required=5.0 tests=AWL, BAYES_99, RDNS_NONE, SPF_HELO_PASS, SPF_PASS, URIBL_BLOCKED autolearn=no version=3.3.2 X-HELO: mx1.redhat.com Received: from Unknown (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 22 Nov 2013 00:34:50 +0000 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id rAM0Yhuo000659 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Thu, 21 Nov 2013 19:34:43 -0500 Received: from tucnak.zalov.cz (vpn1-4-253.ams2.redhat.com [10.36.4.253]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id rAM0YeLo025794 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 21 Nov 2013 19:34:41 -0500 Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.14.7/8.14.7) with ESMTP id rAM0Ydeq028561; Fri, 22 Nov 2013 01:34:39 +0100 Received: (from jakub@localhost) by tucnak.zalov.cz (8.14.7/8.14.7/Submit) id rAM0YbAa028560; Fri, 22 Nov 2013 01:34:37 +0100 Date: Fri, 22 Nov 2013 01:34:37 +0100 From: Jakub Jelinek To: Richard Henderson , Richard Biener , Martin Jambor , Jan Hubicka , Aldy Hernandez Cc: gcc-patches@gcc.gnu.org Subject: [PATCH] OpenMP #pragma omp declare simd support Message-ID: <20131122003437.GS892@tucnak.redhat.com> Reply-To: Jakub Jelinek MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-IsSubscribed: yes Hi! Working virtually out of the Azores now. Here is the full OpenMP 4.0 elementals (#pragma omp declare simd) support extracted from the gomp-4_0-branch. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2013-11-21 Aldy Hernandez Jakub Jelinek * cgraph.h (enum cgraph_simd_clone_arg_type): New. (struct cgraph_simd_clone_arg, struct cgraph_simd_clone): New. (struct cgraph_node): Add simdclone and simd_clones fields. * config/i386/i386.c (ix86_simd_clone_compute_vecsize_and_simdlen, ix86_simd_clone_adjust, ix86_simd_clone_usable): New functions. (TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN, TARGET_SIMD_CLONE_ADJUST, TARGET_SIMD_CLONE_USABLE): Define. * doc/tm.texi.in (TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN, TARGET_SIMD_CLONE_ADJUST, TARGET_SIMD_CLONE_USABLE): Add. * doc/tm.texi: Regenerated. * expr.c (store_constructor): Allow CONSTRUCTOR with VECTOR_TYPE (same sized) elements even if the type of the CONSTRUCTOR has vector mode and target is a REG. * ggc.h (ggc_alloc_cleared_simd_clone_stat): New function. * ipa.c (symtab_remove_unreachable_nodes): If node with simd clones is kept, keep also the simd clones. * ipa-cp.c (determine_versionability): Fail if node->simd_clones or node->simdclone is non-NULL. (initialize_node_lattices): Set disable = true for simd clones. * ipa-prop.c (get_vector_of_formal_parm_types): Renamed to ... (ipa_get_vector_of_formal_parm_types): ... this. No longer static. (ipa_modify_formal_parameters): Adjust caller. Remove synth_parm_prefix argument. Use operator enum instead of bit fields. Add assert for properly handling vector of references. Handle creating brand new parameters. (ipa_modify_call_arguments): Use operator enum instead of bit fields. (ipa_combine_adjustments): Same. Assert that IPA_PARM_OP_NEW is not used. (ipa_modify_expr, get_ssa_base_param, ipa_get_adjustment_candidate): New functions. (ipa_dump_param_adjustments): Rename reduction to new_decl. Use operator enum instead of bit fields. * ipa-prop.h (enum ipa_parm_op): New. (struct ipa_parm_adjustment): New fields op, simdlen. Rename reduction to new_decl, new_arg_prefix to arg_prefix and remove remove_param and copy_param. (ipa_modify_formal_parameters): Remove last argument. (ipa_get_vector_of_formal_parm_types, ipa_modify_expr, ipa_get_adjustment_candidate): New prototypes. * omp-low.c: Include pretty-print.h and ipa-prop.h. (simd_clone_vector_of_formal_parm_types): New function. (simd_clone_struct_alloc, simd_clone_struct_copy, simd_clone_vector_of_formal_parm_types, simd_clone_clauses_extract, simd_clone_compute_base_data_type, simd_clone_mangle, simd_clone_create, simd_clone_adjust_return_type, create_tmp_simd_array, simd_clone_adjust_argument_types, simd_clone_init_simd_arrays): New functions. (struct modify_stmt_info): New type. (ipa_simd_modify_stmt_ops, ipa_simd_modify_function_body, simd_clone_adjust, expand_simd_clones, ipa_omp_simd_clone): New functions. (pass_data_omp_simd_clone): New variable. (pass_omp_simd_clone): New class. (make_pass_omp_simd_clone): New function. * passes.def (pass_omp_simd_clone): New. * target.def (TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN, TARGET_SIMD_CLONE_ADJUST, TARGET_SIMD_CLONE_USABLE): New target hooks. * target.h (struct cgraph_node, struct cgraph_simd_node): Declare. * tree-core.h (OMP_CLAUSE_LINEAR_VARIABLE_STRIDE): Document. * tree.h (OMP_CLAUSE_LINEAR_VARIABLE_STRIDE): Define. * tree-pass.h (make_pass_omp_simd_clone): New prototype. * tree-sra.c (turn_representatives_into_adjustments): Use operator enum. Set arg_prefix. (get_adjustment_for_base): Use operator enum. (sra_ipa_modify_expr): Rename to ipa_modify_expr and move to ipa-prop.c. (sra_ipa_modify_assign): Rename sra_ipa_modify_expr to ipa_modify_expr. (ipa_sra_modify_function_body): Same. No longer static. (sra_ipa_reset_debug_stmts): Use operator enum. (modify_function): Do not pass prefix argument. * tree-vect-data-refs.c: Include cgraph.h. (vect_analyze_data_refs): Inline by hand find_data_references_in_loop and find_data_references_in_bb, if find_data_references_in_stmt fails, still allow calls to #pragma omp declare simd functions in #pragma omp simd loops unless they contain data references among the call arguments or in lhs. * tree-vectorizer.h (enum stmt_vec_info_type): Add call_simd_clone_vec_info_type. * tree-vect-stmts.c: Include tree-ssa-loop.h, tree-scalar-evolution.h and cgraph.h. (vectorizable_call): Handle calls without lhs. (struct simd_call_arg_info): New. (vectorizable_simd_clone_call): New function. (vect_analyze_stmt, vect_transform_stmt): Call it. c/ * c-decl.c (c_builtin_function_ext_scope): Avoid binding if external_scope is NULL. cp/ * semantics.c (finish_omp_clauses): For #pragma omp declare simd linear clause step call maybe_constant_value. testsuite/ * g++.dg/gomp/declare-simd-1.C (f38): Make sure simdlen is a power of two. * gcc.dg/gomp/simd-clones-2.c: Compile on all targets. Remove -msse2. Adjust regexps for name mangling changes. * gcc.dg/gomp/simd-clones-3.c: Likewise. * gcc.dg/vect/vect-simd-clone-1.c: New test. * gcc.dg/vect/vect-simd-clone-2.c: New test. * gcc.dg/vect/vect-simd-clone-3.c: New test. * gcc.dg/vect/vect-simd-clone-4.c: New test. * gcc.dg/vect/vect-simd-clone-5.c: New test. * gcc.dg/vect/vect-simd-clone-6.c: New test. * gcc.dg/vect/vect-simd-clone-7.c: New test. * gcc.dg/vect/vect-simd-clone-8.c: New test. * gcc.dg/vect/vect-simd-clone-9.c: New test. * gcc.dg/vect/vect-simd-clone-10.c: New test. * gcc.dg/vect/vect-simd-clone-10.h: New file. * gcc.dg/vect/vect-simd-clone-10a.c: New file. Jakub --- gcc/cgraph.h (.../trunk) (revision 205223) +++ gcc/cgraph.h (.../branches/gomp-4_0-branch) (revision 205231) @@ -256,6 +261,99 @@ struct GTY(()) cgraph_clone_info bitmap combined_args_to_skip; }; +enum cgraph_simd_clone_arg_type +{ + SIMD_CLONE_ARG_TYPE_VECTOR, + SIMD_CLONE_ARG_TYPE_UNIFORM, + SIMD_CLONE_ARG_TYPE_LINEAR_CONSTANT_STEP, + SIMD_CLONE_ARG_TYPE_LINEAR_VARIABLE_STEP, + SIMD_CLONE_ARG_TYPE_MASK +}; + +/* Function arguments in the original function of a SIMD clone. + Supplementary data for `struct simd_clone'. */ + +struct GTY(()) cgraph_simd_clone_arg { + /* Original function argument as it originally existed in + DECL_ARGUMENTS. */ + tree orig_arg; + + /* orig_arg's function (or for extern functions type from + TYPE_ARG_TYPES). */ + tree orig_type; + + /* If argument is a vector, this holds the vector version of + orig_arg that after adjusting the argument types will live in + DECL_ARGUMENTS. Otherwise, this is NULL. + + This basically holds: + vector(simdlen) __typeof__(orig_arg) new_arg. */ + tree vector_arg; + + /* vector_arg's type (or for extern functions new vector type. */ + tree vector_type; + + /* If argument is a vector, this holds the array where the simd + argument is held while executing the simd clone function. This + is a local variable in the cloned function. Its content is + copied from vector_arg upon entry to the clone. + + This basically holds: + __typeof__(orig_arg) simd_array[simdlen]. */ + tree simd_array; + + /* A SIMD clone's argument can be either linear (constant or + variable), uniform, or vector. */ + enum cgraph_simd_clone_arg_type arg_type; + + /* For arg_type SIMD_CLONE_ARG_TYPE_LINEAR_CONSTANT_STEP this is + the constant linear step, if arg_type is + SIMD_CLONE_ARG_TYPE_LINEAR_VARIABLE_STEP, this is index of + the uniform argument holding the step, otherwise 0. */ + HOST_WIDE_INT linear_step; + + /* Variable alignment if available, otherwise 0. */ + unsigned int alignment; +}; + +/* Specific data for a SIMD function clone. */ + +struct GTY(()) cgraph_simd_clone { + /* Number of words in the SIMD lane associated with this clone. */ + unsigned int simdlen; + + /* Number of annotated function arguments in `args'. This is + usually the number of named arguments in FNDECL. */ + unsigned int nargs; + + /* Max hardware vector size in bits for integral vectors. */ + unsigned int vecsize_int; + + /* Max hardware vector size in bits for floating point vectors. */ + unsigned int vecsize_float; + + /* The mangling character for a given vector size. This is is used + to determine the ISA mangling bit as specified in the Intel + Vector ABI. */ + unsigned char vecsize_mangle; + + /* True if this is the masked, in-branch version of the clone, + otherwise false. */ + unsigned int inbranch : 1; + + /* True if this is a Cilk Plus variant. */ + unsigned int cilk_elemental : 1; + + /* Doubly linked list of SIMD clones. */ + struct cgraph_node *prev_clone, *next_clone; + + /* Original cgraph node the SIMD clones were created for. */ + struct cgraph_node *origin; + + /* Annotated function arguments for the original function. */ + struct cgraph_simd_clone_arg GTY((length ("%h.nargs"))) args[1]; +}; + /* The cgraph data structure. Each function decl has assigned cgraph_node listing callees and callers. */ @@ -284,6 +382,12 @@ public: /* Declaration node used to be clone of. */ tree former_clone_of; + /* If this is a SIMD clone, this points to the SIMD specific + information for it. */ + struct cgraph_simd_clone *simdclone; + /* If this function has SIMD clones, this points to the first clone. */ + struct cgraph_node *simd_clones; + /* Interprocedural passes scheduled to have their transform functions applied next time we execute local pass on them. We maintain it per-function in order to allow IPA passes to introduce new functions. */ --- gcc/config/i386/i386.c (.../trunk) (revision 205223) +++ gcc/config/i386/i386.c (.../branches/gomp-4_0-branch) (revision 205231) @@ -43683,6 +43683,172 @@ ix86_memmodel_check (unsigned HOST_WIDE_ return val; } +/* Set CLONEI->vecsize_mangle, CLONEI->vecsize_int, + CLONEI->vecsize_float and if CLONEI->simdlen is 0, also + CLONEI->simdlen. Return 0 if SIMD clones shouldn't be emitted, + or number of vecsize_mangle variants that should be emitted. */ + +static int +ix86_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node, + struct cgraph_simd_clone *clonei, + tree base_type, int num) +{ + int ret = 1; + + if (clonei->simdlen + && (clonei->simdlen < 2 + || clonei->simdlen > 16 + || (clonei->simdlen & (clonei->simdlen - 1)) != 0)) + { + warning_at (DECL_SOURCE_LOCATION (node->decl), 0, + "unsupported simdlen %d\n", clonei->simdlen); + return 0; + } + + tree ret_type = TREE_TYPE (TREE_TYPE (node->decl)); + if (TREE_CODE (ret_type) != VOID_TYPE) + switch (TYPE_MODE (ret_type)) + { + case QImode: + case HImode: + case SImode: + case DImode: + case SFmode: + case DFmode: + /* case SCmode: */ + /* case DCmode: */ + break; + default: + warning_at (DECL_SOURCE_LOCATION (node->decl), 0, + "unsupported return type %qT for simd\n", ret_type); + return 0; + } + + tree t; + int i; + + for (t = DECL_ARGUMENTS (node->decl), i = 0; t; t = DECL_CHAIN (t), i++) + /* FIXME: Shouldn't we allow such arguments if they are uniform? */ + switch (TYPE_MODE (TREE_TYPE (t))) + { + case QImode: + case HImode: + case SImode: + case DImode: + case SFmode: + case DFmode: + /* case SCmode: */ + /* case DCmode: */ + break; + default: + warning_at (DECL_SOURCE_LOCATION (node->decl), 0, + "unsupported argument type %qT for simd\n", TREE_TYPE (t)); + return 0; + } + + if (clonei->cilk_elemental) + { + /* Parse here processor clause. If not present, default to 'b'. */ + clonei->vecsize_mangle = 'b'; + } + else + { + clonei->vecsize_mangle = "bcd"[num]; + ret = 3; + } + switch (clonei->vecsize_mangle) + { + case 'b': + clonei->vecsize_int = 128; + clonei->vecsize_float = 128; + break; + case 'c': + clonei->vecsize_int = 128; + clonei->vecsize_float = 256; + break; + case 'd': + clonei->vecsize_int = 256; + clonei->vecsize_float = 256; + break; + } + if (clonei->simdlen == 0) + { + if (SCALAR_INT_MODE_P (TYPE_MODE (base_type))) + clonei->simdlen = clonei->vecsize_int; + else + clonei->simdlen = clonei->vecsize_float; + clonei->simdlen /= GET_MODE_BITSIZE (TYPE_MODE (base_type)); + if (clonei->simdlen > 16) + clonei->simdlen = 16; + } + return ret; +} + +/* Add target attribute to SIMD clone NODE if needed. */ + +static void +ix86_simd_clone_adjust (struct cgraph_node *node) +{ + const char *str = NULL; + gcc_assert (node->decl == cfun->decl); + switch (node->simdclone->vecsize_mangle) + { + case 'b': + if (!TARGET_SSE2) + str = "sse2"; + break; + case 'c': + if (!TARGET_AVX) + str = "avx"; + break; + case 'd': + if (!TARGET_AVX2) + str = "avx2"; + break; + default: + gcc_unreachable (); + } + if (str == NULL) + return; + push_cfun (NULL); + tree args = build_tree_list (NULL_TREE, build_string (strlen (str), str)); + bool ok = ix86_valid_target_attribute_p (node->decl, NULL, args, 0); + gcc_assert (ok); + pop_cfun (); + ix86_previous_fndecl = NULL_TREE; + ix86_set_current_function (node->decl); +} + +/* If SIMD clone NODE can't be used in a vectorized loop + in current function, return -1, otherwise return a badness of using it + (0 if it is most desirable from vecsize_mangle point of view, 1 + slightly less desirable, etc.). */ + +static int +ix86_simd_clone_usable (struct cgraph_node *node) +{ + switch (node->simdclone->vecsize_mangle) + { + case 'b': + if (!TARGET_SSE2) + return -1; + if (!TARGET_AVX) + return 0; + return TARGET_AVX2 ? 2 : 1; + case 'c': + if (!TARGET_AVX) + return -1; + return TARGET_AVX2 ? 1 : 0; + break; + case 'd': + if (!TARGET_AVX2) + return -1; + return 0; + default: + gcc_unreachable (); + } +} + /* Implement TARGET_FLOAT_EXCEPTIONS_ROUNDING_SUPPORTED_P. */ static bool @@ -44171,6 +44337,18 @@ ix86_atomic_assign_expand_fenv (tree *ho #undef TARGET_SPILL_CLASS #define TARGET_SPILL_CLASS ix86_spill_class +#undef TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN +#define TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN \ + ix86_simd_clone_compute_vecsize_and_simdlen + +#undef TARGET_SIMD_CLONE_ADJUST +#define TARGET_SIMD_CLONE_ADJUST \ + ix86_simd_clone_adjust + +#undef TARGET_SIMD_CLONE_USABLE +#define TARGET_SIMD_CLONE_USABLE \ + ix86_simd_clone_usable + #undef TARGET_FLOAT_EXCEPTIONS_ROUNDING_SUPPORTED_P #define TARGET_FLOAT_EXCEPTIONS_ROUNDING_SUPPORTED_P \ ix86_float_exceptions_rounding_supported_p --- gcc/doc/tm.texi (.../trunk) (revision 205223) +++ gcc/doc/tm.texi (.../branches/gomp-4_0-branch) (revision 205231) @@ -5818,6 +5818,26 @@ The default is @code{NULL_TREE} which me loads. @end deftypefn +@deftypefn {Target Hook} int TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN (struct cgraph_node *@var{}, struct cgraph_simd_clone *@var{}, @var{tree}, @var{int}) +This hook should set @var{vecsize_mangle}, @var{vecsize_int}, @var{vecsize_float} +fields in @var{simd_clone} structure pointed by @var{clone_info} argument and also +@var{simdlen} field if it was previously 0. +The hook should return 0 if SIMD clones shouldn't be emitted, +or number of @var{vecsize_mangle} variants that should be emitted. +@end deftypefn + +@deftypefn {Target Hook} void TARGET_SIMD_CLONE_ADJUST (struct cgraph_node *@var{}) +This hook should add implicit @code{attribute(target("..."))} attribute +to SIMD clone @var{node} if needed. +@end deftypefn + +@deftypefn {Target Hook} int TARGET_SIMD_CLONE_USABLE (struct cgraph_node *@var{}) +This hook should return -1 if SIMD clone @var{node} shouldn't be used +in vectorized loops in current function, or non-negative number if it is +usable. In that case, the smaller the number is, the more desirable it is +to use it. +@end deftypefn + @node Anchored Addresses @section Anchored Addresses @cindex anchored addresses --- gcc/doc/tm.texi.in (.../trunk) (revision 205223) +++ gcc/doc/tm.texi.in (.../branches/gomp-4_0-branch) (revision 205231) @@ -4422,6 +4422,12 @@ address; but often a machine-dependent @hook TARGET_VECTORIZE_BUILTIN_GATHER +@hook TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN + +@hook TARGET_SIMD_CLONE_ADJUST + +@hook TARGET_SIMD_CLONE_USABLE + @node Anchored Addresses @section Anchored Addresses @cindex anchored addresses --- gcc/expr.c (.../trunk) (revision 205223) +++ gcc/expr.c (.../branches/gomp-4_0-branch) (revision 205231) @@ -6305,6 +6305,18 @@ store_constructor (tree exp, rtx target, enum machine_mode mode = GET_MODE (target); icode = (int) optab_handler (vec_init_optab, mode); + /* Don't use vec_init if some elements have VECTOR_TYPE. */ + if (icode != CODE_FOR_nothing) + { + tree value; + + FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (exp), idx, value) + if (TREE_CODE (TREE_TYPE (value)) == VECTOR_TYPE) + { + icode = CODE_FOR_nothing; + break; + } + } if (icode != CODE_FOR_nothing) { unsigned int i; @@ -6382,8 +6394,8 @@ store_constructor (tree exp, rtx target, if (vector) { - /* Vector CONSTRUCTORs should only be built from smaller - vectors in the case of BLKmode vectors. */ + /* vec_init should not be used if there are VECTOR_TYPE + elements. */ gcc_assert (TREE_CODE (TREE_TYPE (value)) != VECTOR_TYPE); RTVEC_ELT (vector, eltpos) = expand_normal (value); --- gcc/ggc.h (.../trunk) (revision 205223) +++ gcc/ggc.h (.../branches/gomp-4_0-branch) (revision 205231) @@ -276,4 +276,11 @@ ggc_alloc_cleared_gimple_statement_stat ggc_internal_cleared_alloc_stat (s PASS_MEM_STAT); } +static inline struct simd_clone * +ggc_alloc_cleared_simd_clone_stat (size_t s MEM_STAT_DECL) +{ + return (struct simd_clone *) + ggc_internal_cleared_alloc_stat (s PASS_MEM_STAT); +} + #endif --- gcc/ipa.c (.../trunk) (revision 205223) +++ gcc/ipa.c (.../branches/gomp-4_0-branch) (revision 205231) @@ -247,7 +247,7 @@ walk_polymorphic_call_targets (pointer_s hope calls to them will be devirtualized. Again we remove them after inlining. In late optimization some - devirtualization may happen, but it is not importnat since we won't inline + devirtualization may happen, but it is not important since we won't inline the call. In theory early opts and IPA should work out all important cases. - virtual clones needs bodies of their origins for later materialization; @@ -275,7 +275,7 @@ walk_polymorphic_call_targets (pointer_s by reachable symbols or origins of clones). The queue is represented as linked list by AUX pointer terminated by 1. - A the end we keep all reachable symbols. For symbols in boundary we always + At the end we keep all reachable symbols. For symbols in boundary we always turn definition into a declaration, but we may keep function body around based on body_needed_for_clonning @@ -427,6 +427,19 @@ symtab_remove_unreachable_nodes (bool be enqueue_node (cnode, &first, reachable); } } + + } + /* If any reachable function has simd clones, mark them as + reachable as well. */ + if (cnode->simd_clones) + { + cgraph_node *next; + for (next = cnode->simd_clones; + next; + next = next->simdclone->next_clone) + if (in_boundary_p + || !pointer_set_insert (reachable, next)) + enqueue_node (next, &first, reachable); } } /* When we see constructor of external variable, keep referred nodes in the --- gcc/ipa-cp.c (.../trunk) (revision 205223) +++ gcc/ipa-cp.c (.../branches/gomp-4_0-branch) (revision 205231) @@ -429,6 +429,15 @@ determine_versionability (struct cgraph_ reason = "not a tree_versionable_function"; else if (cgraph_function_body_availability (node) <= AVAIL_OVERWRITABLE) reason = "insufficient body availability"; + else if (node->simd_clones != NULL) + { + /* Ideally we should clone the SIMD clones themselves and create + vector copies of them, so IPA-cp and SIMD clones can happily + coexist, but that may not be worth the effort. */ + reason = "function has SIMD clones"; + } + else if (node->simdclone != NULL) + reason = "function is SIMD clone"; if (reason && dump_file && !node->alias && !node->thunk.thunk_p) fprintf (dump_file, "Function %s/%i is not versionable, reason: %s.\n", @@ -695,6 +704,8 @@ initialize_node_lattices (struct cgraph_ else disable = true; } + else if (node->simdclone) + disable = true; if (disable || variable) { --- gcc/ipa-prop.c (.../trunk) (revision 205223) +++ gcc/ipa-prop.c (.../branches/gomp-4_0-branch) (revision 205231) @@ -3355,8 +3355,8 @@ ipa_get_vector_of_formal_parms (tree fnd /* Return a heap allocated vector containing types of formal parameters of function type FNTYPE. */ -static inline vec -get_vector_of_formal_parm_types (tree fntype) +vec +ipa_get_vector_of_formal_parm_types (tree fntype) { vec types; int count = 0; @@ -3378,32 +3378,22 @@ get_vector_of_formal_parm_types (tree fn base_index field. */ void -ipa_modify_formal_parameters (tree fndecl, ipa_parm_adjustment_vec adjustments, - const char *synth_parm_prefix) +ipa_modify_formal_parameters (tree fndecl, ipa_parm_adjustment_vec adjustments) { - vec oparms, otypes; - tree orig_type, new_type = NULL; - tree old_arg_types, t, new_arg_types = NULL; - tree parm, *link = &DECL_ARGUMENTS (fndecl); - int i, len = adjustments.length (); - tree new_reversed = NULL; - bool care_for_types, last_parm_void; - - if (!synth_parm_prefix) - synth_parm_prefix = "SYNTH"; - - oparms = ipa_get_vector_of_formal_parms (fndecl); - orig_type = TREE_TYPE (fndecl); - old_arg_types = TYPE_ARG_TYPES (orig_type); + vec oparms = ipa_get_vector_of_formal_parms (fndecl); + tree orig_type = TREE_TYPE (fndecl); + tree old_arg_types = TYPE_ARG_TYPES (orig_type); /* The following test is an ugly hack, some functions simply don't have any arguments in their type. This is probably a bug but well... */ - care_for_types = (old_arg_types != NULL_TREE); + bool care_for_types = (old_arg_types != NULL_TREE); + bool last_parm_void; + vec otypes; if (care_for_types) { last_parm_void = (TREE_VALUE (tree_last (old_arg_types)) == void_type_node); - otypes = get_vector_of_formal_parm_types (orig_type); + otypes = ipa_get_vector_of_formal_parm_types (orig_type); if (last_parm_void) gcc_assert (oparms.length () + 1 == otypes.length ()); else @@ -3415,16 +3405,23 @@ ipa_modify_formal_parameters (tree fndec otypes.create (0); } - for (i = 0; i < len; i++) + int len = adjustments.length (); + tree *link = &DECL_ARGUMENTS (fndecl); + tree new_arg_types = NULL; + for (int i = 0; i < len; i++) { struct ipa_parm_adjustment *adj; gcc_assert (link); adj = &adjustments[i]; - parm = oparms[adj->base_index]; + tree parm; + if (adj->op == IPA_PARM_OP_NEW) + parm = NULL; + else + parm = oparms[adj->base_index]; adj->base = parm; - if (adj->copy_param) + if (adj->op == IPA_PARM_OP_COPY) { if (care_for_types) new_arg_types = tree_cons (NULL_TREE, otypes[adj->base_index], @@ -3432,23 +3429,36 @@ ipa_modify_formal_parameters (tree fndec *link = parm; link = &DECL_CHAIN (parm); } - else if (!adj->remove_param) + else if (adj->op != IPA_PARM_OP_REMOVE) { tree new_parm; tree ptype; - if (adj->by_ref) - ptype = build_pointer_type (adj->type); + if (adj->simdlen) + { + /* If we have a non-null simdlen but by_ref is true, we + want a vector of pointers. Build the vector of + pointers here, not a pointer to a vector in the + adj->by_ref case below. */ + ptype = build_vector_type (adj->type, adj->simdlen); + } + else if (adj->by_ref) + { + ptype = build_pointer_type (adj->type); + } else - ptype = adj->type; + { + gcc_checking_assert (!adj->by_ref || adj->simdlen); + ptype = adj->type; + } if (care_for_types) new_arg_types = tree_cons (NULL_TREE, ptype, new_arg_types); new_parm = build_decl (UNKNOWN_LOCATION, PARM_DECL, NULL_TREE, ptype); - DECL_NAME (new_parm) = create_tmp_var_name (synth_parm_prefix); - + const char *prefix = adj->arg_prefix ? adj->arg_prefix : "SYNTH"; + DECL_NAME (new_parm) = create_tmp_var_name (prefix); DECL_ARTIFICIAL (new_parm) = 1; DECL_ARG_TYPE (new_parm) = ptype; DECL_CONTEXT (new_parm) = fndecl; @@ -3456,17 +3466,20 @@ ipa_modify_formal_parameters (tree fndec DECL_IGNORED_P (new_parm) = 1; layout_decl (new_parm, 0); - adj->base = parm; - adj->reduction = new_parm; + if (adj->op == IPA_PARM_OP_NEW) + adj->base = NULL; + else + adj->base = parm; + adj->new_decl = new_parm; *link = new_parm; - link = &DECL_CHAIN (new_parm); } } *link = NULL_TREE; + tree new_reversed = NULL; if (care_for_types) { new_reversed = nreverse (new_arg_types); @@ -3484,8 +3497,9 @@ ipa_modify_formal_parameters (tree fndec Exception is METHOD_TYPEs must have THIS argument. When we are asked to remove it, we need to build new FUNCTION_TYPE instead. */ + tree new_type = NULL; if (TREE_CODE (orig_type) != METHOD_TYPE - || (adjustments[0].copy_param + || (adjustments[0].op == IPA_PARM_OP_COPY && adjustments[0].base_index == 0)) { new_type = build_distinct_type_copy (orig_type); @@ -3509,7 +3523,7 @@ ipa_modify_formal_parameters (tree fndec /* This is a new type, not a copy of an old type. Need to reassociate variants. We can handle everything except the main variant lazily. */ - t = TYPE_MAIN_VARIANT (orig_type); + tree t = TYPE_MAIN_VARIANT (orig_type); if (orig_type != t) { TYPE_MAIN_VARIANT (new_type) = t; @@ -3558,13 +3572,13 @@ ipa_modify_call_arguments (struct cgraph adj = &adjustments[i]; - if (adj->copy_param) + if (adj->op == IPA_PARM_OP_COPY) { tree arg = gimple_call_arg (stmt, adj->base_index); vargs.quick_push (arg); } - else if (!adj->remove_param) + else if (adj->op != IPA_PARM_OP_REMOVE) { tree expr, base, off; location_t loc; @@ -3683,7 +3697,7 @@ ipa_modify_call_arguments (struct cgraph NULL, true, GSI_SAME_STMT); vargs.quick_push (expr); } - if (!adj->copy_param && MAY_HAVE_DEBUG_STMTS) + if (adj->op != IPA_PARM_OP_COPY && MAY_HAVE_DEBUG_STMTS) { unsigned int ix; tree ddecl = NULL_TREE, origin = DECL_ORIGIN (adj->base), arg; @@ -3758,6 +3772,124 @@ ipa_modify_call_arguments (struct cgraph free_dominance_info (CDI_DOMINATORS); } +/* If the expression *EXPR should be replaced by a reduction of a parameter, do + so. ADJUSTMENTS is a pointer to a vector of adjustments. CONVERT + specifies whether the function should care about type incompatibility the + current and new expressions. If it is false, the function will leave + incompatibility issues to the caller. Return true iff the expression + was modified. */ + +bool +ipa_modify_expr (tree *expr, bool convert, + ipa_parm_adjustment_vec adjustments) +{ + struct ipa_parm_adjustment *cand + = ipa_get_adjustment_candidate (&expr, &convert, adjustments, false); + if (!cand) + return false; + + tree src; + if (cand->by_ref) + src = build_simple_mem_ref (cand->new_decl); + else + src = cand->new_decl; + + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "About to replace expr "); + print_generic_expr (dump_file, *expr, 0); + fprintf (dump_file, " with "); + print_generic_expr (dump_file, src, 0); + fprintf (dump_file, "\n"); + } + + if (convert && !useless_type_conversion_p (TREE_TYPE (*expr), cand->type)) + { + tree vce = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (*expr), src); + *expr = vce; + } + else + *expr = src; + return true; +} + +/* If T is an SSA_NAME, return NULL if it is not a default def or + return its base variable if it is. If IGNORE_DEFAULT_DEF is true, + the base variable is always returned, regardless if it is a default + def. Return T if it is not an SSA_NAME. */ + +static tree +get_ssa_base_param (tree t, bool ignore_default_def) +{ + if (TREE_CODE (t) == SSA_NAME) + { + if (ignore_default_def || SSA_NAME_IS_DEFAULT_DEF (t)) + return SSA_NAME_VAR (t); + else + return NULL_TREE; + } + return t; +} + +/* Given an expression, return an adjustment entry specifying the + transformation to be done on EXPR. If no suitable adjustment entry + was found, returns NULL. + + If IGNORE_DEFAULT_DEF is set, consider SSA_NAMEs which are not a + default def, otherwise bail on them. + + If CONVERT is non-NULL, this function will set *CONVERT if the + expression provided is a component reference. ADJUSTMENTS is the + adjustments vector. */ + +ipa_parm_adjustment * +ipa_get_adjustment_candidate (tree **expr, bool *convert, + ipa_parm_adjustment_vec adjustments, + bool ignore_default_def) +{ + if (TREE_CODE (**expr) == BIT_FIELD_REF + || TREE_CODE (**expr) == IMAGPART_EXPR + || TREE_CODE (**expr) == REALPART_EXPR) + { + *expr = &TREE_OPERAND (**expr, 0); + if (convert) + *convert = true; + } + + HOST_WIDE_INT offset, size, max_size; + tree base = get_ref_base_and_extent (**expr, &offset, &size, &max_size); + if (!base || size == -1 || max_size == -1) + return NULL; + + if (TREE_CODE (base) == MEM_REF) + { + offset += mem_ref_offset (base).low * BITS_PER_UNIT; + base = TREE_OPERAND (base, 0); + } + + base = get_ssa_base_param (base, ignore_default_def); + if (!base || TREE_CODE (base) != PARM_DECL) + return NULL; + + struct ipa_parm_adjustment *cand = NULL; + unsigned int len = adjustments.length (); + for (unsigned i = 0; i < len; i++) + { + struct ipa_parm_adjustment *adj = &adjustments[i]; + + if (adj->base == base + && (adj->offset == offset || adj->op == IPA_PARM_OP_REMOVE)) + { + cand = adj; + break; + } + } + + if (!cand || cand->op == IPA_PARM_OP_COPY || cand->op == IPA_PARM_OP_REMOVE) + return NULL; + return cand; +} + /* Return true iff BASE_INDEX is in ADJUSTMENTS more than once. */ static bool @@ -3803,10 +3935,14 @@ ipa_combine_adjustments (ipa_parm_adjust struct ipa_parm_adjustment *n; n = &inner[i]; - if (n->remove_param) + if (n->op == IPA_PARM_OP_REMOVE) removals++; else - tmp.quick_push (*n); + { + /* FIXME: Handling of new arguments are not implemented yet. */ + gcc_assert (n->op != IPA_PARM_OP_NEW); + tmp.quick_push (*n); + } } adjustments.create (outlen + removals); @@ -3817,27 +3953,32 @@ ipa_combine_adjustments (ipa_parm_adjust struct ipa_parm_adjustment *in = &tmp[out->base_index]; memset (&r, 0, sizeof (r)); - gcc_assert (!in->remove_param); - if (out->remove_param) + gcc_assert (in->op != IPA_PARM_OP_REMOVE); + if (out->op == IPA_PARM_OP_REMOVE) { if (!index_in_adjustments_multiple_times_p (in->base_index, tmp)) { - r.remove_param = true; + r.op = IPA_PARM_OP_REMOVE; adjustments.quick_push (r); } continue; } + else + { + /* FIXME: Handling of new arguments are not implemented yet. */ + gcc_assert (out->op != IPA_PARM_OP_NEW); + } r.base_index = in->base_index; r.type = out->type; /* FIXME: Create nonlocal value too. */ - if (in->copy_param && out->copy_param) - r.copy_param = true; - else if (in->copy_param) + if (in->op == IPA_PARM_OP_COPY && out->op == IPA_PARM_OP_COPY) + r.op = IPA_PARM_OP_COPY; + else if (in->op == IPA_PARM_OP_COPY) r.offset = out->offset; - else if (out->copy_param) + else if (out->op == IPA_PARM_OP_COPY) r.offset = in->offset; else r.offset = in->offset + out->offset; @@ -3848,7 +3989,7 @@ ipa_combine_adjustments (ipa_parm_adjust { struct ipa_parm_adjustment *n = &inner[i]; - if (n->remove_param) + if (n->op == IPA_PARM_OP_REMOVE) adjustments.quick_push (*n); } @@ -3885,10 +4026,10 @@ ipa_dump_param_adjustments (FILE *file, fprintf (file, ", base: "); print_generic_expr (file, adj->base, 0); } - if (adj->reduction) + if (adj->new_decl) { - fprintf (file, ", reduction: "); - print_generic_expr (file, adj->reduction, 0); + fprintf (file, ", new_decl: "); + print_generic_expr (file, adj->new_decl, 0); } if (adj->new_ssa_base) { @@ -3896,9 +4037,9 @@ ipa_dump_param_adjustments (FILE *file, print_generic_expr (file, adj->new_ssa_base, 0); } - if (adj->copy_param) + if (adj->op == IPA_PARM_OP_COPY) fprintf (file, ", copy_param"); - else if (adj->remove_param) + else if (adj->op == IPA_PARM_OP_REMOVE) fprintf (file, ", remove_param"); else fprintf (file, ", offset %li", (long) adj->offset); --- gcc/ipa-prop.h (.../trunk) (revision 205223) +++ gcc/ipa-prop.h (.../branches/gomp-4_0-branch) (revision 205231) @@ -609,6 +609,27 @@ extern alloc_pool ipcp_values_pool; extern alloc_pool ipcp_sources_pool; extern alloc_pool ipcp_agg_lattice_pool; +/* Operation to be performed for the parameter in ipa_parm_adjustment + below. */ +enum ipa_parm_op { + IPA_PARM_OP_NONE, + + /* This describes a brand new parameter. + + The field `type' should be set to the new type, `arg_prefix' + should be set to the string prefix for the new DECL_NAME, and + `new_decl' will ultimately hold the newly created argument. */ + IPA_PARM_OP_NEW, + + /* This new parameter is an unmodified parameter at index base_index. */ + IPA_PARM_OP_COPY, + + /* This adjustment describes a parameter that is about to be removed + completely. Most users will probably need to book keep those so that they + don't leave behinfd any non default def ssa names belonging to them. */ + IPA_PARM_OP_REMOVE +}; + /* Structure to describe transformations of formal parameters and actual arguments. Each instance describes one new parameter and they are meant to be stored in a vector. Additionally, most users will probably want to store @@ -632,10 +653,11 @@ struct ipa_parm_adjustment arguments. */ tree alias_ptr_type; - /* The new declaration when creating/replacing a parameter. Created by - ipa_modify_formal_parameters, useful for functions modifying the body - accordingly. */ - tree reduction; + /* The new declaration when creating/replacing a parameter. Created + by ipa_modify_formal_parameters, useful for functions modifying + the body accordingly. For brand new arguments, this is the newly + created argument. */ + tree new_decl; /* New declaration of a substitute variable that we may use to replace all non-default-def ssa names when a parm decl is going away. */ @@ -645,22 +667,23 @@ struct ipa_parm_adjustment is NULL), this is going to be its nonlocalized vars value. */ tree nonlocal_value; + /* This holds the prefix to be used for the new DECL_NAME. */ + const char *arg_prefix; + /* Offset into the original parameter (for the cases when the new parameter is a component of an original one). */ HOST_WIDE_INT offset; - /* Zero based index of the original parameter this one is based on. (ATM - there is no way to insert a new parameter out of the blue because there is - no need but if it arises the code can be easily exteded to do so.) */ + /* Zero based index of the original parameter this one is based on. */ int base_index; - /* This new parameter is an unmodified parameter at index base_index. */ - unsigned copy_param : 1; - - /* This adjustment describes a parameter that is about to be removed - completely. Most users will probably need to book keep those so that they - don't leave behinfd any non default def ssa names belonging to them. */ - unsigned remove_param : 1; + /* If non-null, the parameter is a vector of `type' with this many + elements. */ + int simdlen; + + /* Whether this parameter is a new parameter, a copy of an old one, + or one about to be removed. */ + enum ipa_parm_op op; /* The parameter is to be passed by reference. */ unsigned by_ref : 1; @@ -671,8 +694,8 @@ typedef struct ipa_parm_adjustment ipa_p typedef vec ipa_parm_adjustment_vec; vec ipa_get_vector_of_formal_parms (tree fndecl); -void ipa_modify_formal_parameters (tree fndecl, ipa_parm_adjustment_vec, - const char *); +vec ipa_get_vector_of_formal_parm_types (tree fntype); +void ipa_modify_formal_parameters (tree fndecl, ipa_parm_adjustment_vec); void ipa_modify_call_arguments (struct cgraph_edge *, gimple, ipa_parm_adjustment_vec); ipa_parm_adjustment_vec ipa_combine_adjustments (ipa_parm_adjustment_vec, @@ -690,6 +713,10 @@ tree ipa_value_from_jfunc (struct ipa_no struct ipa_jump_func *jfunc); unsigned int ipcp_transform_function (struct cgraph_node *node); void ipa_dump_param (FILE *, struct ipa_node_params *info, int i); +bool ipa_modify_expr (tree *, bool, ipa_parm_adjustment_vec); +ipa_parm_adjustment *ipa_get_adjustment_candidate (tree **, bool *, + ipa_parm_adjustment_vec, + bool); /* From tree-sra.c: */ --- gcc/omp-low.c.jj 2013-11-21 09:25:07.000000000 +0100 +++ gcc/omp-low.c 2013-11-21 22:17:19.334300797 +0100 @@ -61,6 +61,8 @@ along with GCC; see the file COPYING3. #include "omp-low.h" #include "gimple-low.h" #include "tree-cfgcleanup.h" +#include "pretty-print.h" +#include "ipa-prop.h" #include "tree-nested.h" @@ -10573,5 +10677,1151 @@ make_pass_diagnose_omp_blocks (gcc::cont { return new pass_diagnose_omp_blocks (ctxt); } + +/* SIMD clone supporting code. */ + +/* Allocate a fresh `simd_clone' and return it. NARGS is the number + of arguments to reserve space for. */ + +static struct cgraph_simd_clone * +simd_clone_struct_alloc (int nargs) +{ + struct cgraph_simd_clone *clone_info; + size_t len = (sizeof (struct cgraph_simd_clone) + + nargs * sizeof (struct cgraph_simd_clone_arg)); + clone_info = (struct cgraph_simd_clone *) + ggc_internal_cleared_alloc_stat (len PASS_MEM_STAT); + return clone_info; +} + +/* Make a copy of the `struct cgraph_simd_clone' in FROM to TO. */ + +static inline void +simd_clone_struct_copy (struct cgraph_simd_clone *to, + struct cgraph_simd_clone *from) +{ + memcpy (to, from, (sizeof (struct cgraph_simd_clone) + + from->nargs * sizeof (struct cgraph_simd_clone_arg))); +} + +/* Return vector of parameter types of function FNDECL. This uses + TYPE_ARG_TYPES if available, otherwise falls back to types of + DECL_ARGUMENTS types. */ + +vec +simd_clone_vector_of_formal_parm_types (tree fndecl) +{ + if (TYPE_ARG_TYPES (TREE_TYPE (fndecl))) + return ipa_get_vector_of_formal_parm_types (TREE_TYPE (fndecl)); + vec args = ipa_get_vector_of_formal_parms (fndecl); + unsigned int i; + tree arg; + FOR_EACH_VEC_ELT (args, i, arg) + args[i] = TREE_TYPE (args[i]); + return args; +} + +/* Given a simd function in NODE, extract the simd specific + information from the OMP clauses passed in CLAUSES, and return + the struct cgraph_simd_clone * if it should be cloned. *INBRANCH_SPECIFIED + is set to TRUE if the `inbranch' or `notinbranch' clause specified, + otherwise set to FALSE. */ + +static struct cgraph_simd_clone * +simd_clone_clauses_extract (struct cgraph_node *node, tree clauses, + bool *inbranch_specified) +{ + vec args = simd_clone_vector_of_formal_parm_types (node->decl); + tree t; + int n; + *inbranch_specified = false; + + n = args.length (); + if (n > 0 && args.last () == void_type_node) + n--; + + /* To distinguish from an OpenMP simd clone, Cilk Plus functions to + be cloned have a distinctive artificial label in addition to "omp + declare simd". */ + bool cilk_clone + = (flag_enable_cilkplus + && lookup_attribute ("cilk plus elemental", + DECL_ATTRIBUTES (node->decl))); + + /* Allocate one more than needed just in case this is an in-branch + clone which will require a mask argument. */ + struct cgraph_simd_clone *clone_info = simd_clone_struct_alloc (n + 1); + clone_info->nargs = n; + clone_info->cilk_elemental = cilk_clone; + + if (!clauses) + { + args.release (); + return clone_info; + } + clauses = TREE_VALUE (clauses); + if (!clauses || TREE_CODE (clauses) != OMP_CLAUSE) + return clone_info; + + for (t = clauses; t; t = OMP_CLAUSE_CHAIN (t)) + { + switch (OMP_CLAUSE_CODE (t)) + { + case OMP_CLAUSE_INBRANCH: + clone_info->inbranch = 1; + *inbranch_specified = true; + break; + case OMP_CLAUSE_NOTINBRANCH: + clone_info->inbranch = 0; + *inbranch_specified = true; + break; + case OMP_CLAUSE_SIMDLEN: + clone_info->simdlen + = TREE_INT_CST_LOW (OMP_CLAUSE_SIMDLEN_EXPR (t)); + break; + case OMP_CLAUSE_LINEAR: + { + tree decl = OMP_CLAUSE_DECL (t); + tree step = OMP_CLAUSE_LINEAR_STEP (t); + int argno = TREE_INT_CST_LOW (decl); + if (OMP_CLAUSE_LINEAR_VARIABLE_STRIDE (t)) + { + clone_info->args[argno].arg_type + = SIMD_CLONE_ARG_TYPE_LINEAR_VARIABLE_STEP; + clone_info->args[argno].linear_step = tree_to_shwi (step); + gcc_assert (clone_info->args[argno].linear_step >= 0 + && clone_info->args[argno].linear_step < n); + } + else + { + if (POINTER_TYPE_P (args[argno])) + step = fold_convert (ssizetype, step); + if (!tree_fits_shwi_p (step)) + { + warning_at (OMP_CLAUSE_LOCATION (t), 0, + "ignoring large linear step"); + args.release (); + return NULL; + } + else if (integer_zerop (step)) + { + warning_at (OMP_CLAUSE_LOCATION (t), 0, + "ignoring zero linear step"); + args.release (); + return NULL; + } + else + { + clone_info->args[argno].arg_type + = SIMD_CLONE_ARG_TYPE_LINEAR_CONSTANT_STEP; + clone_info->args[argno].linear_step = tree_to_shwi (step); + } + } + break; + } + case OMP_CLAUSE_UNIFORM: + { + tree decl = OMP_CLAUSE_DECL (t); + int argno = tree_to_uhwi (decl); + clone_info->args[argno].arg_type + = SIMD_CLONE_ARG_TYPE_UNIFORM; + break; + } + case OMP_CLAUSE_ALIGNED: + { + tree decl = OMP_CLAUSE_DECL (t); + int argno = tree_to_uhwi (decl); + clone_info->args[argno].alignment + = TREE_INT_CST_LOW (OMP_CLAUSE_ALIGNED_ALIGNMENT (t)); + break; + } + default: + break; + } + } + args.release (); + return clone_info; +} + +/* Given a SIMD clone in NODE, calculate the characteristic data + type and return the coresponding type. The characteristic data + type is computed as described in the Intel Vector ABI. */ + +static tree +simd_clone_compute_base_data_type (struct cgraph_node *node, + struct cgraph_simd_clone *clone_info) +{ + tree type = integer_type_node; + tree fndecl = node->decl; + + /* a) For non-void function, the characteristic data type is the + return type. */ + if (TREE_CODE (TREE_TYPE (TREE_TYPE (fndecl))) != VOID_TYPE) + type = TREE_TYPE (TREE_TYPE (fndecl)); + + /* b) If the function has any non-uniform, non-linear parameters, + then the characteristic data type is the type of the first + such parameter. */ + else + { + vec map = simd_clone_vector_of_formal_parm_types (fndecl); + for (unsigned int i = 0; i < clone_info->nargs; ++i) + if (clone_info->args[i].arg_type == SIMD_CLONE_ARG_TYPE_VECTOR) + { + type = map[i]; + break; + } + map.release (); + } + + /* c) If the characteristic data type determined by a) or b) above + is struct, union, or class type which is pass-by-value (except + for the type that maps to the built-in complex data type), the + characteristic data type is int. */ + if (RECORD_OR_UNION_TYPE_P (type) + && !aggregate_value_p (type, NULL) + && TREE_CODE (type) != COMPLEX_TYPE) + return integer_type_node; + + /* d) If none of the above three classes is applicable, the + characteristic data type is int. */ + + return type; + + /* e) For Intel Xeon Phi native and offload compilation, if the + resulting characteristic data type is 8-bit or 16-bit integer + data type, the characteristic data type is int. */ + /* Well, we don't handle Xeon Phi yet. */ +} + +static tree +simd_clone_mangle (struct cgraph_node *node, + struct cgraph_simd_clone *clone_info) +{ + char vecsize_mangle = clone_info->vecsize_mangle; + char mask = clone_info->inbranch ? 'M' : 'N'; + unsigned int simdlen = clone_info->simdlen; + unsigned int n; + pretty_printer pp; + + gcc_assert (vecsize_mangle && simdlen); + + pp_string (&pp, "_ZGV"); + pp_character (&pp, vecsize_mangle); + pp_character (&pp, mask); + pp_decimal_int (&pp, simdlen); + + for (n = 0; n < clone_info->nargs; ++n) + { + struct cgraph_simd_clone_arg arg = clone_info->args[n]; + + if (arg.arg_type == SIMD_CLONE_ARG_TYPE_UNIFORM) + pp_character (&pp, 'u'); + else if (arg.arg_type == SIMD_CLONE_ARG_TYPE_LINEAR_CONSTANT_STEP) + { + gcc_assert (arg.linear_step != 0); + pp_character (&pp, 'l'); + if (arg.linear_step > 1) + pp_unsigned_wide_integer (&pp, arg.linear_step); + else if (arg.linear_step < 0) + { + pp_character (&pp, 'n'); + pp_unsigned_wide_integer (&pp, (-(unsigned HOST_WIDE_INT) + arg.linear_step)); + } + } + else if (arg.arg_type == SIMD_CLONE_ARG_TYPE_LINEAR_VARIABLE_STEP) + { + pp_character (&pp, 's'); + pp_unsigned_wide_integer (&pp, arg.linear_step); + } + else + pp_character (&pp, 'v'); + if (arg.alignment) + { + pp_character (&pp, 'a'); + pp_decimal_int (&pp, arg.alignment); + } + } + + pp_underscore (&pp); + pp_string (&pp, + IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (node->decl))); + const char *str = pp_formatted_text (&pp); + + /* If there already is a SIMD clone with the same mangled name, don't + add another one. This can happen e.g. for + #pragma omp declare simd + #pragma omp declare simd simdlen(8) + int foo (int, int); + if the simdlen is assumed to be 8 for the first one, etc. */ + for (struct cgraph_node *clone = node->simd_clones; clone; + clone = clone->simdclone->next_clone) + if (strcmp (IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (clone->decl)), + str) == 0) + return NULL_TREE; + + return get_identifier (str); +} + +/* Create a simd clone of OLD_NODE and return it. */ + +static struct cgraph_node * +simd_clone_create (struct cgraph_node *old_node) +{ + struct cgraph_node *new_node; + if (old_node->definition) + new_node = cgraph_function_versioning (old_node, vNULL, NULL, NULL, false, + NULL, NULL, "simdclone"); + else + { + tree old_decl = old_node->decl; + tree new_decl = copy_node (old_node->decl); + DECL_NAME (new_decl) = clone_function_name (old_decl, "simdclone"); + SET_DECL_ASSEMBLER_NAME (new_decl, DECL_NAME (new_decl)); + SET_DECL_RTL (new_decl, NULL); + DECL_STATIC_CONSTRUCTOR (new_decl) = 0; + DECL_STATIC_DESTRUCTOR (new_decl) = 0; + new_node + = cgraph_copy_node_for_versioning (old_node, new_decl, vNULL, NULL); + cgraph_call_function_insertion_hooks (new_node); + } + if (new_node == NULL) + return new_node; + + TREE_PUBLIC (new_node->decl) = TREE_PUBLIC (old_node->decl); + + /* The function cgraph_function_versioning () will force the new + symbol local. Undo this, and inherit external visability from + the old node. */ + new_node->local.local = old_node->local.local; + new_node->externally_visible = old_node->externally_visible; + + return new_node; +} + +/* Adjust the return type of the given function to its appropriate + vector counterpart. Returns a simd array to be used throughout the + function as a return value. */ + +static tree +simd_clone_adjust_return_type (struct cgraph_node *node) +{ + tree fndecl = node->decl; + tree orig_rettype = TREE_TYPE (TREE_TYPE (fndecl)); + unsigned int veclen; + tree t; + + /* Adjust the function return type. */ + if (orig_rettype == void_type_node) + return NULL_TREE; + TREE_TYPE (fndecl) = build_distinct_type_copy (TREE_TYPE (fndecl)); + if (INTEGRAL_TYPE_P (TREE_TYPE (TREE_TYPE (fndecl))) + || POINTER_TYPE_P (TREE_TYPE (TREE_TYPE (fndecl)))) + veclen = node->simdclone->vecsize_int; + else + veclen = node->simdclone->vecsize_float; + veclen /= GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (fndecl)))); + if (veclen > node->simdclone->simdlen) + veclen = node->simdclone->simdlen; + if (veclen == node->simdclone->simdlen) + TREE_TYPE (TREE_TYPE (fndecl)) + = build_vector_type (TREE_TYPE (TREE_TYPE (fndecl)), + node->simdclone->simdlen); + else + { + t = build_vector_type (TREE_TYPE (TREE_TYPE (fndecl)), veclen); + t = build_array_type_nelts (t, node->simdclone->simdlen / veclen); + TREE_TYPE (TREE_TYPE (fndecl)) = t; + } + if (!node->definition) + return NULL_TREE; + + t = DECL_RESULT (fndecl); + /* Adjust the DECL_RESULT. */ + gcc_assert (TREE_TYPE (t) != void_type_node); + TREE_TYPE (t) = TREE_TYPE (TREE_TYPE (fndecl)); + relayout_decl (t); + + tree atype = build_array_type_nelts (orig_rettype, + node->simdclone->simdlen); + if (veclen != node->simdclone->simdlen) + return build1 (VIEW_CONVERT_EXPR, atype, t); + + /* Set up a SIMD array to use as the return value. */ + tree retval = create_tmp_var_raw (atype, "retval"); + gimple_add_tmp_var (retval); + return retval; +} + +/* Each vector argument has a corresponding array to be used locally + as part of the eventual loop. Create such temporary array and + return it. + + PREFIX is the prefix to be used for the temporary. + + TYPE is the inner element type. + + SIMDLEN is the number of elements. */ + +static tree +create_tmp_simd_array (const char *prefix, tree type, int simdlen) +{ + tree atype = build_array_type_nelts (type, simdlen); + tree avar = create_tmp_var_raw (atype, prefix); + gimple_add_tmp_var (avar); + return avar; +} + +/* Modify the function argument types to their corresponding vector + counterparts if appropriate. Also, create one array for each simd + argument to be used locally when using the function arguments as + part of the loop. + + NODE is the function whose arguments are to be adjusted. + + Returns an adjustment vector that will be filled describing how the + argument types will be adjusted. */ + +static ipa_parm_adjustment_vec +simd_clone_adjust_argument_types (struct cgraph_node *node) +{ + vec args; + ipa_parm_adjustment_vec adjustments; + + if (node->definition) + args = ipa_get_vector_of_formal_parms (node->decl); + else + args = simd_clone_vector_of_formal_parm_types (node->decl); + adjustments.create (args.length ()); + unsigned i, j, veclen; + struct ipa_parm_adjustment adj; + for (i = 0; i < node->simdclone->nargs; ++i) + { + memset (&adj, 0, sizeof (adj)); + tree parm = args[i]; + tree parm_type = node->definition ? TREE_TYPE (parm) : parm; + adj.base_index = i; + adj.base = parm; + + node->simdclone->args[i].orig_arg = node->definition ? parm : NULL_TREE; + node->simdclone->args[i].orig_type = parm_type; + + if (node->simdclone->args[i].arg_type != SIMD_CLONE_ARG_TYPE_VECTOR) + { + /* No adjustment necessary for scalar arguments. */ + adj.op = IPA_PARM_OP_COPY; + } + else + { + if (INTEGRAL_TYPE_P (parm_type) || POINTER_TYPE_P (parm_type)) + veclen = node->simdclone->vecsize_int; + else + veclen = node->simdclone->vecsize_float; + veclen /= GET_MODE_BITSIZE (TYPE_MODE (parm_type)); + if (veclen > node->simdclone->simdlen) + veclen = node->simdclone->simdlen; + adj.simdlen = veclen; + adj.arg_prefix = "simd"; + if (POINTER_TYPE_P (parm_type)) + adj.by_ref = 1; + adj.type = parm_type; + node->simdclone->args[i].vector_type + = build_vector_type (parm_type, veclen); + for (j = veclen; j < node->simdclone->simdlen; j += veclen) + { + adjustments.safe_push (adj); + if (j == veclen) + { + memset (&adj, 0, sizeof (adj)); + adj.op = IPA_PARM_OP_NEW; + adj.arg_prefix = "simd"; + adj.base_index = i; + adj.type = node->simdclone->args[i].vector_type; + } + } + + if (node->definition) + node->simdclone->args[i].simd_array + = create_tmp_simd_array (IDENTIFIER_POINTER (DECL_NAME (parm)), + parm_type, node->simdclone->simdlen); + } + adjustments.safe_push (adj); + } + + if (node->simdclone->inbranch) + { + tree base_type + = simd_clone_compute_base_data_type (node->simdclone->origin, + node->simdclone); + + memset (&adj, 0, sizeof (adj)); + adj.op = IPA_PARM_OP_NEW; + adj.arg_prefix = "mask"; + + adj.base_index = i; + if (INTEGRAL_TYPE_P (base_type) || POINTER_TYPE_P (base_type)) + veclen = node->simdclone->vecsize_int; + else + veclen = node->simdclone->vecsize_float; + veclen /= GET_MODE_BITSIZE (TYPE_MODE (base_type)); + if (veclen > node->simdclone->simdlen) + veclen = node->simdclone->simdlen; + adj.type = build_vector_type (base_type, veclen); + adjustments.safe_push (adj); + + for (j = veclen; j < node->simdclone->simdlen; j += veclen) + adjustments.safe_push (adj); + + /* We have previously allocated one extra entry for the mask. Use + it and fill it. */ + struct cgraph_simd_clone *sc = node->simdclone; + sc->nargs++; + if (node->definition) + { + sc->args[i].orig_arg + = build_decl (UNKNOWN_LOCATION, PARM_DECL, NULL, base_type); + sc->args[i].simd_array + = create_tmp_simd_array ("mask", base_type, sc->simdlen); + } + sc->args[i].orig_type = base_type; + sc->args[i].arg_type = SIMD_CLONE_ARG_TYPE_MASK; + } + + if (node->definition) + ipa_modify_formal_parameters (node->decl, adjustments); + else + { + tree new_arg_types = NULL_TREE, new_reversed; + bool last_parm_void = false; + if (args.length () > 0 && args.last () == void_type_node) + last_parm_void = true; + + gcc_assert (TYPE_ARG_TYPES (TREE_TYPE (node->decl))); + j = adjustments.length (); + for (i = 0; i < j; i++) + { + struct ipa_parm_adjustment *adj = &adjustments[i]; + tree ptype; + if (adj->op == IPA_PARM_OP_COPY) + ptype = args[adj->base_index]; + else if (adj->simdlen) + ptype = build_vector_type (adj->type, adj->simdlen); + else + ptype = adj->type; + new_arg_types = tree_cons (NULL_TREE, ptype, new_arg_types); + } + new_reversed = nreverse (new_arg_types); + if (last_parm_void) + { + if (new_reversed) + TREE_CHAIN (new_arg_types) = void_list_node; + else + new_reversed = void_list_node; + } + + tree new_type = build_distinct_type_copy (TREE_TYPE (node->decl)); + TYPE_ARG_TYPES (new_type) = new_reversed; + TREE_TYPE (node->decl) = new_type; + + adjustments.release (); + } + args.release (); + return adjustments; +} + +/* Initialize and copy the function arguments in NODE to their + corresponding local simd arrays. Returns a fresh gimple_seq with + the instruction sequence generated. */ + +static gimple_seq +simd_clone_init_simd_arrays (struct cgraph_node *node, + ipa_parm_adjustment_vec adjustments) +{ + gimple_seq seq = NULL; + unsigned i = 0, j = 0, k; + + for (tree arg = DECL_ARGUMENTS (node->decl); + arg; + arg = DECL_CHAIN (arg), i++, j++) + { + if (adjustments[j].op == IPA_PARM_OP_COPY) + continue; + + node->simdclone->args[i].vector_arg = arg; + + tree array = node->simdclone->args[i].simd_array; + if ((unsigned) adjustments[j].simdlen == node->simdclone->simdlen) + { + tree ptype = build_pointer_type (TREE_TYPE (TREE_TYPE (array))); + tree ptr = build_fold_addr_expr (array); + tree t = build2 (MEM_REF, TREE_TYPE (arg), ptr, + build_int_cst (ptype, 0)); + t = build2 (MODIFY_EXPR, TREE_TYPE (t), t, arg); + gimplify_and_add (t, &seq); + } + else + { + unsigned int simdlen = adjustments[j].simdlen; + if (node->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_MASK) + simdlen = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg)); + tree ptype = build_pointer_type (TREE_TYPE (TREE_TYPE (array))); + for (k = 0; k < node->simdclone->simdlen; k += simdlen) + { + tree ptr = build_fold_addr_expr (array); + int elemsize; + if (k) + { + arg = DECL_CHAIN (arg); + j++; + } + elemsize + = GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (arg)))); + tree t = build2 (MEM_REF, TREE_TYPE (arg), ptr, + build_int_cst (ptype, k * elemsize)); + t = build2 (MODIFY_EXPR, TREE_TYPE (t), t, arg); + gimplify_and_add (t, &seq); + } + } + } + return seq; +} + +/* Callback info for ipa_simd_modify_stmt_ops below. */ + +struct modify_stmt_info { + ipa_parm_adjustment_vec adjustments; + gimple stmt; + /* True if the parent statement was modified by + ipa_simd_modify_stmt_ops. */ + bool modified; +}; + +/* Callback for walk_gimple_op. + + Adjust operands from a given statement as specified in the + adjustments vector in the callback data. */ + +static tree +ipa_simd_modify_stmt_ops (tree *tp, int *walk_subtrees, void *data) +{ + struct walk_stmt_info *wi = (struct walk_stmt_info *) data; + if (!SSA_VAR_P (*tp)) + { + /* Make sure we treat subtrees as a RHS. This makes sure that + when examining the `*foo' in *foo=x, the `foo' get treated as + a use properly. */ + wi->is_lhs = false; + wi->val_only = true; + if (TYPE_P (*tp)) + *walk_subtrees = 0; + return NULL_TREE; + } + struct modify_stmt_info *info = (struct modify_stmt_info *) wi->info; + struct ipa_parm_adjustment *cand + = ipa_get_adjustment_candidate (&tp, NULL, info->adjustments, true); + if (!cand) + return NULL_TREE; + + tree t = *tp; + tree repl = make_ssa_name (TREE_TYPE (t), NULL); + + gimple stmt; + gimple_stmt_iterator gsi = gsi_for_stmt (info->stmt); + if (wi->is_lhs) + { + stmt = gimple_build_assign (unshare_expr (cand->new_decl), repl); + gsi_insert_after (&gsi, stmt, GSI_SAME_STMT); + SSA_NAME_DEF_STMT (repl) = info->stmt; + } + else + { + /* You'd think we could skip the extra SSA variable when + wi->val_only=true, but we may have `*var' which will get + replaced into `*var_array[iter]' and will likely be something + not gimple. */ + stmt = gimple_build_assign (repl, unshare_expr (cand->new_decl)); + gsi_insert_before (&gsi, stmt, GSI_SAME_STMT); + } + + if (!useless_type_conversion_p (TREE_TYPE (*tp), TREE_TYPE (repl))) + { + tree vce = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (*tp), repl); + *tp = vce; + } + else + *tp = repl; + + info->modified = true; + wi->is_lhs = false; + wi->val_only = true; + return NULL_TREE; +} + +/* Traverse the function body and perform all modifications as + described in ADJUSTMENTS. At function return, ADJUSTMENTS will be + modified such that the replacement/reduction value will now be an + offset into the corresponding simd_array. + + This function will replace all function argument uses with their + corresponding simd array elements, and ajust the return values + accordingly. */ + +static void +ipa_simd_modify_function_body (struct cgraph_node *node, + ipa_parm_adjustment_vec adjustments, + tree retval_array, tree iter) +{ + basic_block bb; + unsigned int i, j; + + /* Re-use the adjustments array, but this time use it to replace + every function argument use to an offset into the corresponding + simd_array. */ + for (i = 0, j = 0; i < node->simdclone->nargs; ++i, ++j) + { + if (!node->simdclone->args[i].vector_arg) + continue; + + tree basetype = TREE_TYPE (node->simdclone->args[i].orig_arg); + adjustments[j].new_decl + = build4 (ARRAY_REF, + basetype, + node->simdclone->args[i].simd_array, + iter, + NULL_TREE, NULL_TREE); + if (adjustments[j].op == IPA_PARM_OP_NONE + && (unsigned) adjustments[j].simdlen < node->simdclone->simdlen) + j += node->simdclone->simdlen / adjustments[j].simdlen - 1; + } + + struct modify_stmt_info info; + info.adjustments = adjustments; + + FOR_EACH_BB_FN (bb, DECL_STRUCT_FUNCTION (node->decl)) + { + gimple_stmt_iterator gsi; + + gsi = gsi_start_bb (bb); + while (!gsi_end_p (gsi)) + { + gimple stmt = gsi_stmt (gsi); + info.stmt = stmt; + struct walk_stmt_info wi; + + memset (&wi, 0, sizeof (wi)); + info.modified = false; + wi.info = &info; + walk_gimple_op (stmt, ipa_simd_modify_stmt_ops, &wi); + + if (gimple_code (stmt) == GIMPLE_RETURN) + { + tree retval = gimple_return_retval (stmt); + if (!retval) + { + gsi_remove (&gsi, true); + continue; + } + + /* Replace `return foo' with `retval_array[iter] = foo'. */ + tree ref = build4 (ARRAY_REF, TREE_TYPE (retval), + retval_array, iter, NULL, NULL); + stmt = gimple_build_assign (ref, retval); + gsi_replace (&gsi, stmt, true); + info.modified = true; + } + + if (info.modified) + { + update_stmt (stmt); + if (maybe_clean_eh_stmt (stmt)) + gimple_purge_dead_eh_edges (gimple_bb (stmt)); + } + gsi_next (&gsi); + } + } +} + +/* Adjust the argument types in NODE to their appropriate vector + counterparts. */ + +static void +simd_clone_adjust (struct cgraph_node *node) +{ + push_cfun (DECL_STRUCT_FUNCTION (node->decl)); + + targetm.simd_clone.adjust (node); + + tree retval = simd_clone_adjust_return_type (node); + ipa_parm_adjustment_vec adjustments + = simd_clone_adjust_argument_types (node); + + push_gimplify_context (); + + gimple_seq seq = simd_clone_init_simd_arrays (node, adjustments); + + /* Adjust all uses of vector arguments accordingly. Adjust all + return values accordingly. */ + tree iter = create_tmp_var (unsigned_type_node, "iter"); + tree iter1 = make_ssa_name (iter, NULL); + tree iter2 = make_ssa_name (iter, NULL); + ipa_simd_modify_function_body (node, adjustments, retval, iter1); + + /* Initialize the iteration variable. */ + basic_block entry_bb = single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun)); + basic_block body_bb = split_block_after_labels (entry_bb)->dest; + gimple_stmt_iterator gsi = gsi_after_labels (entry_bb); + /* Insert the SIMD array and iv initialization at function + entry. */ + gsi_insert_seq_before (&gsi, seq, GSI_NEW_STMT); + + pop_gimplify_context (NULL); + + /* Create a new BB right before the original exit BB, to hold the + iteration increment and the condition/branch. */ + basic_block orig_exit = EDGE_PRED (EXIT_BLOCK_PTR_FOR_FN (cfun), 0)->src; + basic_block incr_bb = create_empty_bb (orig_exit); + /* The succ of orig_exit was EXIT_BLOCK_PTR_FOR_FN (cfun), with an empty + flag. Set it now to be a FALLTHRU_EDGE. */ + gcc_assert (EDGE_COUNT (orig_exit->succs) == 1); + EDGE_SUCC (orig_exit, 0)->flags |= EDGE_FALLTHRU; + for (unsigned i = 0; + i < EDGE_COUNT (EXIT_BLOCK_PTR_FOR_FN (cfun)->preds); ++i) + { + edge e = EDGE_PRED (EXIT_BLOCK_PTR_FOR_FN (cfun), i); + redirect_edge_succ (e, incr_bb); + } + edge e = make_edge (incr_bb, EXIT_BLOCK_PTR_FOR_FN (cfun), 0); + e->probability = REG_BR_PROB_BASE; + gsi = gsi_last_bb (incr_bb); + gimple g = gimple_build_assign_with_ops (PLUS_EXPR, iter2, iter1, + build_int_cst (unsigned_type_node, + 1)); + gsi_insert_after (&gsi, g, GSI_CONTINUE_LINKING); + + /* Mostly annotate the loop for the vectorizer (the rest is done below). */ + struct loop *loop = alloc_loop (); + cfun->has_force_vect_loops = true; + loop->safelen = node->simdclone->simdlen; + loop->force_vect = true; + loop->header = body_bb; + add_bb_to_loop (incr_bb, loop); + + /* Branch around the body if the mask applies. */ + if (node->simdclone->inbranch) + { + gimple_stmt_iterator gsi = gsi_last_bb (loop->header); + tree mask_array + = node->simdclone->args[node->simdclone->nargs - 1].simd_array; + tree mask = make_ssa_name (TREE_TYPE (TREE_TYPE (mask_array)), NULL); + tree aref = build4 (ARRAY_REF, + TREE_TYPE (TREE_TYPE (mask_array)), + mask_array, iter1, + NULL, NULL); + g = gimple_build_assign (mask, aref); + gsi_insert_after (&gsi, g, GSI_CONTINUE_LINKING); + int bitsize = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (aref))); + if (!INTEGRAL_TYPE_P (TREE_TYPE (aref))) + { + aref = build1 (VIEW_CONVERT_EXPR, + build_nonstandard_integer_type (bitsize, 0), mask); + mask = make_ssa_name (TREE_TYPE (aref), NULL); + g = gimple_build_assign (mask, aref); + gsi_insert_after (&gsi, g, GSI_CONTINUE_LINKING); + } + + g = gimple_build_cond (EQ_EXPR, mask, build_zero_cst (TREE_TYPE (mask)), + NULL, NULL); + gsi_insert_after (&gsi, g, GSI_CONTINUE_LINKING); + make_edge (loop->header, incr_bb, EDGE_TRUE_VALUE); + FALLTHRU_EDGE (loop->header)->flags = EDGE_FALSE_VALUE; + } + + /* Generate the condition. */ + g = gimple_build_cond (LT_EXPR, + iter2, + build_int_cst (unsigned_type_node, + node->simdclone->simdlen), + NULL, NULL); + gsi_insert_after (&gsi, g, GSI_CONTINUE_LINKING); + e = split_block (incr_bb, gsi_stmt (gsi)); + basic_block latch_bb = e->dest; + basic_block new_exit_bb = e->dest; + new_exit_bb = split_block (latch_bb, NULL)->dest; + loop->latch = latch_bb; + + redirect_edge_succ (FALLTHRU_EDGE (latch_bb), body_bb); + + make_edge (incr_bb, new_exit_bb, EDGE_FALSE_VALUE); + /* The successor of incr_bb is already pointing to latch_bb; just + change the flags. + make_edge (incr_bb, latch_bb, EDGE_TRUE_VALUE); */ + FALLTHRU_EDGE (incr_bb)->flags = EDGE_TRUE_VALUE; + + gimple phi = create_phi_node (iter1, body_bb); + edge preheader_edge = find_edge (entry_bb, body_bb); + edge latch_edge = single_succ_edge (latch_bb); + add_phi_arg (phi, build_zero_cst (unsigned_type_node), preheader_edge, + UNKNOWN_LOCATION); + add_phi_arg (phi, iter2, latch_edge, UNKNOWN_LOCATION); + + /* Generate the new return. */ + gsi = gsi_last_bb (new_exit_bb); + if (retval + && TREE_CODE (retval) == VIEW_CONVERT_EXPR + && TREE_CODE (TREE_OPERAND (retval, 0)) == RESULT_DECL) + retval = TREE_OPERAND (retval, 0); + else if (retval) + { + retval = build1 (VIEW_CONVERT_EXPR, + TREE_TYPE (TREE_TYPE (node->decl)), + retval); + retval = force_gimple_operand_gsi (&gsi, retval, true, NULL, + false, GSI_CONTINUE_LINKING); + } + g = gimple_build_return (retval); + gsi_insert_after (&gsi, g, GSI_CONTINUE_LINKING); + + /* Handle aligned clauses by replacing default defs of the aligned + uniform args with __builtin_assume_aligned (arg_N(D), alignment) + lhs. Handle linear by adding PHIs. */ + for (unsigned i = 0; i < node->simdclone->nargs; i++) + if (node->simdclone->args[i].alignment + && node->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_UNIFORM + && (node->simdclone->args[i].alignment + & (node->simdclone->args[i].alignment - 1)) == 0 + && TREE_CODE (TREE_TYPE (node->simdclone->args[i].orig_arg)) + == POINTER_TYPE) + { + unsigned int alignment = node->simdclone->args[i].alignment; + tree orig_arg = node->simdclone->args[i].orig_arg; + tree def = ssa_default_def (cfun, orig_arg); + if (!has_zero_uses (def)) + { + tree fn = builtin_decl_explicit (BUILT_IN_ASSUME_ALIGNED); + gimple_seq seq = NULL; + bool need_cvt = false; + gimple call + = gimple_build_call (fn, 2, def, size_int (alignment)); + g = call; + if (!useless_type_conversion_p (TREE_TYPE (orig_arg), + ptr_type_node)) + need_cvt = true; + tree t = make_ssa_name (need_cvt ? ptr_type_node : orig_arg, NULL); + gimple_call_set_lhs (g, t); + gimple_seq_add_stmt_without_update (&seq, g); + if (need_cvt) + { + t = make_ssa_name (orig_arg, NULL); + g = gimple_build_assign_with_ops (NOP_EXPR, t, + gimple_call_lhs (g), + NULL_TREE); + gimple_seq_add_stmt_without_update (&seq, g); + } + gsi_insert_seq_on_edge_immediate + (single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun)), seq); + + entry_bb = single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun)); + int freq = compute_call_stmt_bb_frequency (current_function_decl, + entry_bb); + cgraph_create_edge (node, cgraph_get_create_node (fn), + call, entry_bb->count, freq); + + imm_use_iterator iter; + use_operand_p use_p; + gimple use_stmt; + tree repl = gimple_get_lhs (g); + FOR_EACH_IMM_USE_STMT (use_stmt, iter, def) + if (is_gimple_debug (use_stmt) || use_stmt == call) + continue; + else + FOR_EACH_IMM_USE_ON_STMT (use_p, iter) + SET_USE (use_p, repl); + } + } + else if (node->simdclone->args[i].arg_type + == SIMD_CLONE_ARG_TYPE_LINEAR_CONSTANT_STEP) + { + tree orig_arg = node->simdclone->args[i].orig_arg; + tree def = ssa_default_def (cfun, orig_arg); + gcc_assert (INTEGRAL_TYPE_P (TREE_TYPE (orig_arg)) + || POINTER_TYPE_P (TREE_TYPE (orig_arg))); + if (!has_zero_uses (def)) + { + iter1 = make_ssa_name (orig_arg, NULL); + iter2 = make_ssa_name (orig_arg, NULL); + phi = create_phi_node (iter1, body_bb); + add_phi_arg (phi, def, preheader_edge, UNKNOWN_LOCATION); + add_phi_arg (phi, iter2, latch_edge, UNKNOWN_LOCATION); + enum tree_code code = INTEGRAL_TYPE_P (TREE_TYPE (orig_arg)) + ? PLUS_EXPR : POINTER_PLUS_EXPR; + tree addtype = INTEGRAL_TYPE_P (TREE_TYPE (orig_arg)) + ? TREE_TYPE (orig_arg) : sizetype; + tree addcst + = build_int_cst (addtype, node->simdclone->args[i].linear_step); + g = gimple_build_assign_with_ops (code, iter2, iter1, addcst); + gsi = gsi_last_bb (incr_bb); + gsi_insert_before (&gsi, g, GSI_SAME_STMT); + + imm_use_iterator iter; + use_operand_p use_p; + gimple use_stmt; + FOR_EACH_IMM_USE_STMT (use_stmt, iter, def) + if (use_stmt == phi) + continue; + else + FOR_EACH_IMM_USE_ON_STMT (use_p, iter) + SET_USE (use_p, iter1); + } + } + + calculate_dominance_info (CDI_DOMINATORS); + add_loop (loop, loop->header->loop_father); + update_ssa (TODO_update_ssa); + + pop_cfun (); +} + +/* If the function in NODE is tagged as an elemental SIMD function, + create the appropriate SIMD clones. */ + +static void +expand_simd_clones (struct cgraph_node *node) +{ + if (lookup_attribute ("noclone", DECL_ATTRIBUTES (node->decl))) + return; + + tree attr = lookup_attribute ("omp declare simd", + DECL_ATTRIBUTES (node->decl)); + if (!attr || targetm.simd_clone.compute_vecsize_and_simdlen == NULL) + return; + /* Ignore + #pragma omp declare simd + extern int foo (); + in C, there we don't know the argument types at all. */ + if (!node->definition + && TYPE_ARG_TYPES (TREE_TYPE (node->decl)) == NULL_TREE) + return; + do + { + bool inbranch_clause_specified; + struct cgraph_simd_clone *clone_info + = simd_clone_clauses_extract (node, TREE_VALUE (attr), + &inbranch_clause_specified); + if (clone_info == NULL) + continue; + + int orig_simdlen = clone_info->simdlen; + tree base_type = simd_clone_compute_base_data_type (node, clone_info); + int count + = targetm.simd_clone.compute_vecsize_and_simdlen (node, clone_info, + base_type, 0); + if (count == 0) + continue; + + for (int i = 0; i < count * 2; i++) + { + struct cgraph_simd_clone *clone = clone_info; + if (inbranch_clause_specified && (i & 1) != 0) + continue; + + if (i != 0) + { + clone = simd_clone_struct_alloc (clone_info->nargs + - clone_info->inbranch + + ((i & 1) != 0)); + simd_clone_struct_copy (clone, clone_info); + clone->nargs -= clone_info->inbranch; + clone->simdlen = orig_simdlen; + targetm.simd_clone.compute_vecsize_and_simdlen (node, clone, + base_type, + i / 2); + if ((i & 1) != 0) + clone->inbranch = 1; + } + + tree id = simd_clone_mangle (node, clone); + if (id == NULL_TREE) + continue; + + struct cgraph_node *n = simd_clone_create (node); + if (n == NULL) + continue; + + n->simdclone = clone; + clone->origin = node; + clone->next_clone = NULL; + if (node->simd_clones == NULL) + { + clone->prev_clone = n; + node->simd_clones = n; + } + else + { + clone->prev_clone = node->simd_clones->simdclone->prev_clone; + clone->prev_clone->simdclone->next_clone = n; + node->simd_clones->simdclone->prev_clone = n; + } + change_decl_assembler_name (n->decl, id); + if (node->definition) + simd_clone_adjust (n); + else + { + simd_clone_adjust_return_type (n); + simd_clone_adjust_argument_types (n); + } + } + } + while ((attr = lookup_attribute ("omp declare simd", TREE_CHAIN (attr)))); +} + +/* Entry point for IPA simd clone creation pass. */ + +static unsigned int +ipa_omp_simd_clone (void) +{ + struct cgraph_node *node; + FOR_EACH_FUNCTION (node) + expand_simd_clones (node); + return 0; +} + +namespace { + +const pass_data pass_data_omp_simd_clone = +{ + SIMPLE_IPA_PASS, /* type */ + "simdclone", /* name */ + OPTGROUP_NONE, /* optinfo_flags */ + true, /* has_gate */ + true, /* has_execute */ + TV_NONE, /* tv_id */ + ( PROP_ssa | PROP_cfg ), /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + 0, /* todo_flags_finish */ +}; + +class pass_omp_simd_clone : public simple_ipa_opt_pass +{ +public: + pass_omp_simd_clone(gcc::context *ctxt) + : simple_ipa_opt_pass(pass_data_omp_simd_clone, ctxt) + {} + + /* opt_pass methods: */ + bool gate () { return flag_openmp || flag_openmp_simd + || flag_enable_cilkplus; } + unsigned int execute () { return ipa_omp_simd_clone (); } +}; + +} // anon namespace + +simple_ipa_opt_pass * +make_pass_omp_simd_clone (gcc::context *ctxt) +{ + return new pass_omp_simd_clone (ctxt); +} #include "gt-omp-low.h" --- gcc/passes.def (.../trunk) (revision 205223) +++ gcc/passes.def (.../branches/gomp-4_0-branch) (revision 205231) @@ -97,6 +97,7 @@ along with GCC; see the file COPYING3. NEXT_PASS (pass_feedback_split_functions); POP_INSERT_PASSES () NEXT_PASS (pass_ipa_increase_alignment); + NEXT_PASS (pass_omp_simd_clone); NEXT_PASS (pass_ipa_tm); NEXT_PASS (pass_ipa_lower_emutls); TERMINATE_PASS_LIST () --- gcc/target.def (.../trunk) (revision 205223) +++ gcc/target.def (.../branches/gomp-4_0-branch) (revision 205231) @@ -1521,6 +1521,36 @@ hook_int_uint_mode_1) HOOK_VECTOR_END (sched) +/* Functions relating to OpenMP and Cilk Plus SIMD clones. */ +#undef HOOK_PREFIX +#define HOOK_PREFIX "TARGET_SIMD_CLONE_" +HOOK_VECTOR (TARGET_SIMD_CLONE, simd_clone) + +DEFHOOK +(compute_vecsize_and_simdlen, +"This hook should set @var{vecsize_mangle}, @var{vecsize_int}, @var{vecsize_float}\n\ +fields in @var{simd_clone} structure pointed by @var{clone_info} argument and also\n\ +@var{simdlen} field if it was previously 0.\n\ +The hook should return 0 if SIMD clones shouldn't be emitted,\n\ +or number of @var{vecsize_mangle} variants that should be emitted.", +int, (struct cgraph_node *, struct cgraph_simd_clone *, tree, int), NULL) + +DEFHOOK +(adjust, +"This hook should add implicit @code{attribute(target(\"...\"))} attribute\n\ +to SIMD clone @var{node} if needed.", +void, (struct cgraph_node *), NULL) + +DEFHOOK +(usable, +"This hook should return -1 if SIMD clone @var{node} shouldn't be used\n\ +in vectorized loops in current function, or non-negative number if it is\n\ +usable. In that case, the smaller the number is, the more desirable it is\n\ +to use it.", +int, (struct cgraph_node *), NULL) + +HOOK_VECTOR_END (simd_clone) + /* Functions relating to vectorization. */ #undef HOOK_PREFIX #define HOOK_PREFIX "TARGET_VECTORIZE_" --- gcc/target.h (.../trunk) (revision 205223) +++ gcc/target.h (.../branches/gomp-4_0-branch) (revision 205231) @@ -93,6 +93,8 @@ extern bool target_default_pointer_addre struct stdarg_info; struct spec_info_def; struct hard_reg_set_container; +struct cgraph_node; +struct cgraph_simd_clone; /* The struct used by the secondary_reload target hook. */ typedef struct secondary_reload_info --- gcc/tree-core.h (.../trunk) (revision 205223) +++ gcc/tree-core.h (.../branches/gomp-4_0-branch) (revision 205231) @@ -903,6 +903,9 @@ struct GTY(()) tree_base { CALL_ALLOCA_FOR_VAR_P in CALL_EXPR + OMP_CLAUSE_LINEAR_VARIABLE_STRIDE in + OMP_CLAUSE_LINEAR + side_effects_flag: TREE_SIDE_EFFECTS in --- gcc/tree.h (.../trunk) (revision 205223) +++ gcc/tree.h (.../branches/gomp-4_0-branch) (revision 205231) @@ -1346,6 +1351,10 @@ extern void protected_set_expr_location #define OMP_CLAUSE_LINEAR_NO_COPYOUT(NODE) \ TREE_PRIVATE (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_LINEAR)) +/* True if a LINEAR clause has a stride that is variable. */ +#define OMP_CLAUSE_LINEAR_VARIABLE_STRIDE(NODE) \ + TREE_PROTECTED (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_LINEAR)) + #define OMP_CLAUSE_LINEAR_STEP(NODE) \ OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_LINEAR), 1) --- gcc/tree-pass.h (.../trunk) (revision 205223) +++ gcc/tree-pass.h (.../branches/gomp-4_0-branch) (revision 205231) @@ -472,6 +472,7 @@ extern ipa_opt_pass_d *make_pass_ipa_ref extern ipa_opt_pass_d *make_pass_ipa_pure_const (gcc::context *ctxt); extern simple_ipa_opt_pass *make_pass_ipa_pta (gcc::context *ctxt); extern simple_ipa_opt_pass *make_pass_ipa_tm (gcc::context *ctxt); +extern simple_ipa_opt_pass *make_pass_omp_simd_clone (gcc::context *ctxt); extern ipa_opt_pass_d *make_pass_ipa_profile (gcc::context *ctxt); extern ipa_opt_pass_d *make_pass_ipa_cdtor_merge (gcc::context *ctxt); --- gcc/tree-sra.c (.../trunk) (revision 205223) +++ gcc/tree-sra.c (.../branches/gomp-4_0-branch) (revision 205231) @@ -4277,9 +4277,10 @@ turn_representatives_into_adjustments (v adj.base_index = get_param_index (parm, parms); adj.base = parm; if (!repr) - adj.copy_param = 1; + adj.op = IPA_PARM_OP_COPY; else - adj.remove_param = 1; + adj.op = IPA_PARM_OP_REMOVE; + adj.arg_prefix = "ISRA"; adjustments.quick_push (adj); } else @@ -4299,6 +4300,7 @@ turn_representatives_into_adjustments (v adj.by_ref = (POINTER_TYPE_P (TREE_TYPE (repr->base)) && (repr->grp_maybe_modified || repr->grp_not_necessarilly_dereferenced)); + adj.arg_prefix = "ISRA"; adjustments.quick_push (adj); } } @@ -4429,7 +4431,7 @@ get_adjustment_for_base (ipa_parm_adjust struct ipa_parm_adjustment *adj; adj = &adjustments[i]; - if (!adj->copy_param && adj->base == base) + if (adj->op != IPA_PARM_OP_COPY && adj->base == base) return adj; } @@ -4493,84 +4495,6 @@ replace_removed_params_ssa_names (gimple return true; } -/* If the expression *EXPR should be replaced by a reduction of a parameter, do - so. ADJUSTMENTS is a pointer to a vector of adjustments. CONVERT - specifies whether the function should care about type incompatibility the - current and new expressions. If it is false, the function will leave - incompatibility issues to the caller. Return true iff the expression - was modified. */ - -static bool -sra_ipa_modify_expr (tree *expr, bool convert, - ipa_parm_adjustment_vec adjustments) -{ - int i, len; - struct ipa_parm_adjustment *adj, *cand = NULL; - HOST_WIDE_INT offset, size, max_size; - tree base, src; - - len = adjustments.length (); - - if (TREE_CODE (*expr) == BIT_FIELD_REF - || TREE_CODE (*expr) == IMAGPART_EXPR - || TREE_CODE (*expr) == REALPART_EXPR) - { - expr = &TREE_OPERAND (*expr, 0); - convert = true; - } - - base = get_ref_base_and_extent (*expr, &offset, &size, &max_size); - if (!base || size == -1 || max_size == -1) - return false; - - if (TREE_CODE (base) == MEM_REF) - { - offset += mem_ref_offset (base).low * BITS_PER_UNIT; - base = TREE_OPERAND (base, 0); - } - - base = get_ssa_base_param (base); - if (!base || TREE_CODE (base) != PARM_DECL) - return false; - - for (i = 0; i < len; i++) - { - adj = &adjustments[i]; - - if (adj->base == base - && (adj->offset == offset || adj->remove_param)) - { - cand = adj; - break; - } - } - if (!cand || cand->copy_param || cand->remove_param) - return false; - - if (cand->by_ref) - src = build_simple_mem_ref (cand->reduction); - else - src = cand->reduction; - - if (dump_file && (dump_flags & TDF_DETAILS)) - { - fprintf (dump_file, "About to replace expr "); - print_generic_expr (dump_file, *expr, 0); - fprintf (dump_file, " with "); - print_generic_expr (dump_file, src, 0); - fprintf (dump_file, "\n"); - } - - if (convert && !useless_type_conversion_p (TREE_TYPE (*expr), cand->type)) - { - tree vce = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (*expr), src); - *expr = vce; - } - else - *expr = src; - return true; -} - /* If the statement pointed to by STMT_PTR contains any expressions that need to replaced with a different one as noted by ADJUSTMENTS, do so. Handle any potential type incompatibilities (GSI is used to accommodate conversion @@ -4591,8 +4515,8 @@ sra_ipa_modify_assign (gimple *stmt_ptr, rhs_p = gimple_assign_rhs1_ptr (stmt); lhs_p = gimple_assign_lhs_ptr (stmt); - any = sra_ipa_modify_expr (rhs_p, false, adjustments); - any |= sra_ipa_modify_expr (lhs_p, false, adjustments); + any = ipa_modify_expr (rhs_p, false, adjustments); + any |= ipa_modify_expr (lhs_p, false, adjustments); if (any) { tree new_rhs = NULL_TREE; @@ -4638,7 +4562,7 @@ sra_ipa_modify_assign (gimple *stmt_ptr, /* Traverse the function body and all modifications as described in ADJUSTMENTS. Return true iff the CFG has been changed. */ -static bool +bool ipa_sra_modify_function_body (ipa_parm_adjustment_vec adjustments) { bool cfg_changed = false; @@ -4664,7 +4588,7 @@ ipa_sra_modify_function_body (ipa_parm_a case GIMPLE_RETURN: t = gimple_return_retval_ptr (stmt); if (*t != NULL_TREE) - modified |= sra_ipa_modify_expr (t, true, adjustments); + modified |= ipa_modify_expr (t, true, adjustments); break; case GIMPLE_ASSIGN: @@ -4677,13 +4601,13 @@ ipa_sra_modify_function_body (ipa_parm_a for (i = 0; i < gimple_call_num_args (stmt); i++) { t = gimple_call_arg_ptr (stmt, i); - modified |= sra_ipa_modify_expr (t, true, adjustments); + modified |= ipa_modify_expr (t, true, adjustments); } if (gimple_call_lhs (stmt)) { t = gimple_call_lhs_ptr (stmt); - modified |= sra_ipa_modify_expr (t, false, adjustments); + modified |= ipa_modify_expr (t, false, adjustments); modified |= replace_removed_params_ssa_names (stmt, adjustments); } @@ -4693,12 +4617,12 @@ ipa_sra_modify_function_body (ipa_parm_a for (i = 0; i < gimple_asm_ninputs (stmt); i++) { t = &TREE_VALUE (gimple_asm_input_op (stmt, i)); - modified |= sra_ipa_modify_expr (t, true, adjustments); + modified |= ipa_modify_expr (t, true, adjustments); } for (i = 0; i < gimple_asm_noutputs (stmt); i++) { t = &TREE_VALUE (gimple_asm_output_op (stmt, i)); - modified |= sra_ipa_modify_expr (t, false, adjustments); + modified |= ipa_modify_expr (t, false, adjustments); } break; @@ -4744,7 +4668,7 @@ sra_ipa_reset_debug_stmts (ipa_parm_adju use_operand_p use_p; adj = &adjustments[i]; - if (adj->copy_param || !is_gimple_reg (adj->base)) + if (adj->op == IPA_PARM_OP_COPY || !is_gimple_reg (adj->base)) continue; name = ssa_default_def (cfun, adj->base); vexpr = NULL; @@ -4927,7 +4851,7 @@ modify_function (struct cgraph_node *nod redirect_callers.release (); push_cfun (DECL_STRUCT_FUNCTION (new_node->decl)); - ipa_modify_formal_parameters (current_function_decl, adjustments, "ISRA"); + ipa_modify_formal_parameters (current_function_decl, adjustments); cfg_changed = ipa_sra_modify_function_body (adjustments); sra_ipa_reset_debug_stmts (adjustments); convert_callers (new_node, node->decl, adjustments); --- gcc/tree-vect-data-refs.c (.../trunk) (revision 205223) +++ gcc/tree-vect-data-refs.c (.../branches/gomp-4_0-branch) (revision 205231) @@ -49,6 +49,7 @@ along with GCC; see the file COPYING3. #include "tree-scalar-evolution.h" #include "tree-vectorizer.h" #include "diagnostic-core.h" +#include "cgraph.h" /* Need to include rtl.h, expr.h, etc. for optabs. */ #include "expr.h" #include "optabs.h" @@ -3163,10 +3164,11 @@ vect_analyze_data_refs (loop_vec_info lo if (loop_vinfo) { + basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo); + loop = LOOP_VINFO_LOOP (loop_vinfo); - if (!find_loop_nest (loop, &LOOP_VINFO_LOOP_NEST (loop_vinfo)) - || find_data_references_in_loop - (loop, &LOOP_VINFO_DATAREFS (loop_vinfo))) + datarefs = LOOP_VINFO_DATAREFS (loop_vinfo); + if (!find_loop_nest (loop, &LOOP_VINFO_LOOP_NEST (loop_vinfo))) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -3175,7 +3177,57 @@ vect_analyze_data_refs (loop_vec_info lo return false; } - datarefs = LOOP_VINFO_DATAREFS (loop_vinfo); + for (i = 0; i < loop->num_nodes; i++) + { + gimple_stmt_iterator gsi; + + for (gsi = gsi_start_bb (bbs[i]); !gsi_end_p (gsi); gsi_next (&gsi)) + { + gimple stmt = gsi_stmt (gsi); + if (!find_data_references_in_stmt (loop, stmt, &datarefs)) + { + if (is_gimple_call (stmt) && loop->safelen) + { + tree fndecl = gimple_call_fndecl (stmt), op; + if (fndecl != NULL_TREE) + { + struct cgraph_node *node = cgraph_get_node (fndecl); + if (node != NULL && node->simd_clones != NULL) + { + unsigned int j, n = gimple_call_num_args (stmt); + for (j = 0; j < n; j++) + { + op = gimple_call_arg (stmt, j); + if (DECL_P (op) + || (REFERENCE_CLASS_P (op) + && get_base_address (op))) + break; + } + op = gimple_call_lhs (stmt); + /* Ignore #pragma omp declare simd functions + if they don't have data references in the + call stmt itself. */ + if (j == n + && !(op + && (DECL_P (op) + || (REFERENCE_CLASS_P (op) + && get_base_address (op))))) + continue; + } + } + } + LOOP_VINFO_DATAREFS (loop_vinfo) = datarefs; + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "not vectorized: loop contains function " + "calls or data references that cannot " + "be analyzed\n"); + return false; + } + } + } + + LOOP_VINFO_DATAREFS (loop_vinfo) = datarefs; } else { --- gcc/tree-vect-loop.c (.../trunk) (revision 205223) +++ gcc/tree-vect-loop.c (.../branches/gomp-4_0-branch) (revision 205231) @@ -373,6 +373,36 @@ vect_determine_vectorization_factor (loo if (gimple_get_lhs (stmt) == NULL_TREE) { + if (is_gimple_call (stmt)) + { + /* Ignore calls with no lhs. These must be calls to + #pragma omp simd functions, and what vectorization factor + it really needs can't be determined until + vectorizable_simd_clone_call. */ + if (STMT_VINFO_VECTYPE (stmt_info) == NULL_TREE) + { + unsigned int j, n = gimple_call_num_args (stmt); + for (j = 0; j < n; j++) + { + scalar_type = TREE_TYPE (gimple_call_arg (stmt, j)); + vectype = get_vectype_for_scalar_type (scalar_type); + if (vectype) + { + STMT_VINFO_VECTYPE (stmt_info) = vectype; + break; + } + } + } + if (STMT_VINFO_VECTYPE (stmt_info) != NULL_TREE) + { + if (!analyze_pattern_stmt && gsi_end_p (pattern_def_si)) + { + pattern_def_seq = NULL; + gsi_next (&si); + } + continue; + } + } if (dump_enabled_p ()) { dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, --- gcc/tree-vectorizer.h (.../trunk) (revision 205223) +++ gcc/tree-vectorizer.h (.../branches/gomp-4_0-branch) (revision 205231) @@ -443,6 +443,7 @@ enum stmt_vec_info_type { shift_vec_info_type, op_vec_info_type, call_vec_info_type, + call_simd_clone_vec_info_type, assignment_vec_info_type, condition_vec_info_type, reduc_vec_info_type, --- gcc/tree-vect-stmts.c (.../trunk) (revision 205223) +++ gcc/tree-vect-stmts.c (.../branches/gomp-4_0-branch) (revision 205231) @@ -42,12 +42,15 @@ along with GCC; see the file COPYING3. #include "tree-ssanames.h" #include "tree-ssa-loop-manip.h" #include "cfgloop.h" +#include "tree-ssa-loop.h" +#include "tree-scalar-evolution.h" #include "expr.h" #include "recog.h" /* FIXME: for insn_data */ #include "optabs.h" #include "diagnostic-core.h" #include "tree-vectorizer.h" #include "dumpfile.h" +#include "cgraph.h" /* For lang_hooks.types.type_for_mode. */ #include "langhooks.h" @@ -1736,7 +1739,8 @@ vectorizable_call (gimple stmt, gimple_s if (!is_gimple_call (stmt)) return false; - if (TREE_CODE (gimple_call_lhs (stmt)) != SSA_NAME) + if (gimple_call_lhs (stmt) == NULL_TREE + || TREE_CODE (gimple_call_lhs (stmt)) != SSA_NAME) return false; if (stmt_can_throw_internal (stmt)) @@ -2114,6 +2118,603 @@ vectorizable_call (gimple stmt, gimple_s } +struct simd_call_arg_info +{ + tree vectype; + tree op; + enum vect_def_type dt; + HOST_WIDE_INT linear_step; + unsigned int align; +}; + +/* Function vectorizable_simd_clone_call. + + Check if STMT performs a function call that can be vectorized + by calling a simd clone of the function. + If VEC_STMT is also passed, vectorize the STMT: create a vectorized + stmt to replace it, put it in VEC_STMT, and insert it at BSI. + Return FALSE if not a vectorizable STMT, TRUE otherwise. */ + +static bool +vectorizable_simd_clone_call (gimple stmt, gimple_stmt_iterator *gsi, + gimple *vec_stmt, slp_tree slp_node) +{ + tree vec_dest; + tree scalar_dest; + tree op, type; + tree vec_oprnd0 = NULL_TREE; + stmt_vec_info stmt_info = vinfo_for_stmt (stmt), prev_stmt_info; + tree vectype; + unsigned int nunits; + loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); + bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info); + struct loop *loop = loop_vinfo ? LOOP_VINFO_LOOP (loop_vinfo) : NULL; + tree fndecl, new_temp, def; + gimple def_stmt; + gimple new_stmt = NULL; + int ncopies, j; + vec arginfo = vNULL; + vec vargs = vNULL; + size_t i, nargs; + tree lhs, rtype, ratype; + vec *ret_ctor_elts; + + /* Is STMT a vectorizable call? */ + if (!is_gimple_call (stmt)) + return false; + + fndecl = gimple_call_fndecl (stmt); + if (fndecl == NULL_TREE) + return false; + + struct cgraph_node *node = cgraph_get_node (fndecl); + if (node == NULL || node->simd_clones == NULL) + return false; + + if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo) + return false; + + if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def) + return false; + + if (gimple_call_lhs (stmt) + && TREE_CODE (gimple_call_lhs (stmt)) != SSA_NAME) + return false; + + if (stmt_can_throw_internal (stmt)) + return false; + + vectype = STMT_VINFO_VECTYPE (stmt_info); + + if (loop_vinfo && nested_in_vect_loop_p (loop, stmt)) + return false; + + /* FORNOW */ + if (slp_node || PURE_SLP_STMT (stmt_info)) + return false; + + /* Process function arguments. */ + nargs = gimple_call_num_args (stmt); + + /* Bail out if the function has zero arguments. */ + if (nargs == 0) + return false; + + arginfo.create (nargs); + + for (i = 0; i < nargs; i++) + { + simd_call_arg_info thisarginfo; + affine_iv iv; + + thisarginfo.linear_step = 0; + thisarginfo.align = 0; + thisarginfo.op = NULL_TREE; + + op = gimple_call_arg (stmt, i); + if (!vect_is_simple_use_1 (op, stmt, loop_vinfo, bb_vinfo, + &def_stmt, &def, &thisarginfo.dt, + &thisarginfo.vectype)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "use not simple.\n"); + arginfo.release (); + return false; + } + + if (thisarginfo.vectype != NULL_TREE + && loop_vinfo + && TREE_CODE (op) == SSA_NAME + && simple_iv (loop, loop_containing_stmt (stmt), op, &iv, false) + && tree_fits_shwi_p (iv.step)) + { + thisarginfo.linear_step = tree_to_shwi (iv.step); + thisarginfo.op = iv.base; + } + else if (thisarginfo.vectype == NULL_TREE + && POINTER_TYPE_P (TREE_TYPE (op))) + thisarginfo.align = get_pointer_alignment (op) / BITS_PER_UNIT; + + arginfo.quick_push (thisarginfo); + } + + unsigned int badness = 0; + struct cgraph_node *bestn = NULL; + for (struct cgraph_node *n = node->simd_clones; n != NULL; + n = n->simdclone->next_clone) + { + unsigned int this_badness = 0; + if (n->simdclone->simdlen + > (unsigned) LOOP_VINFO_VECT_FACTOR (loop_vinfo) + || n->simdclone->nargs != nargs) + continue; + if (n->simdclone->simdlen + < (unsigned) LOOP_VINFO_VECT_FACTOR (loop_vinfo)) + this_badness += (exact_log2 (LOOP_VINFO_VECT_FACTOR (loop_vinfo)) + - exact_log2 (n->simdclone->simdlen)) * 1024; + if (n->simdclone->inbranch) + this_badness += 2048; + int target_badness = targetm.simd_clone.usable (n); + if (target_badness < 0) + continue; + this_badness += target_badness * 512; + /* FORNOW: Have to add code to add the mask argument. */ + if (n->simdclone->inbranch) + continue; + for (i = 0; i < nargs; i++) + { + switch (n->simdclone->args[i].arg_type) + { + case SIMD_CLONE_ARG_TYPE_VECTOR: + if (!useless_type_conversion_p + (n->simdclone->args[i].orig_type, + TREE_TYPE (gimple_call_arg (stmt, i)))) + i = -1; + else if (arginfo[i].vectype == NULL_TREE + || arginfo[i].linear_step) + this_badness += 64; + break; + case SIMD_CLONE_ARG_TYPE_UNIFORM: + if (arginfo[i].vectype != NULL_TREE) + i = -1; + break; + case SIMD_CLONE_ARG_TYPE_LINEAR_CONSTANT_STEP: + if (arginfo[i].vectype == NULL_TREE + || (arginfo[i].linear_step + != n->simdclone->args[i].linear_step)) + i = -1; + break; + case SIMD_CLONE_ARG_TYPE_LINEAR_VARIABLE_STEP: + /* FORNOW */ + i = -1; + break; + case SIMD_CLONE_ARG_TYPE_MASK: + gcc_unreachable (); + } + if (i == (size_t) -1) + break; + if (n->simdclone->args[i].alignment > arginfo[i].align) + { + i = -1; + break; + } + if (arginfo[i].align) + this_badness += (exact_log2 (arginfo[i].align) + - exact_log2 (n->simdclone->args[i].alignment)); + } + if (i == (size_t) -1) + continue; + if (bestn == NULL || this_badness < badness) + { + bestn = n; + badness = this_badness; + } + } + + if (bestn == NULL) + { + arginfo.release (); + return false; + } + + for (i = 0; i < nargs; i++) + if (arginfo[i].vectype == NULL_TREE + && bestn->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_VECTOR) + { + arginfo[i].vectype + = get_vectype_for_scalar_type (TREE_TYPE (gimple_call_arg (stmt, + i))); + if (arginfo[i].vectype == NULL + || (TYPE_VECTOR_SUBPARTS (arginfo[i].vectype) + > bestn->simdclone->simdlen)) + { + arginfo.release (); + return false; + } + } + + fndecl = bestn->decl; + nunits = bestn->simdclone->simdlen; + ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits; + + /* If the function isn't const, only allow it in simd loops where user + has asserted that at least nunits consecutive iterations can be + performed using SIMD instructions. */ + if ((loop == NULL || (unsigned) loop->safelen < nunits) + && gimple_vuse (stmt)) + { + arginfo.release (); + return false; + } + + /* Sanity check: make sure that at least one copy of the vectorized stmt + needs to be generated. */ + gcc_assert (ncopies >= 1); + + if (!vec_stmt) /* transformation not required. */ + { + STMT_VINFO_TYPE (stmt_info) = call_simd_clone_vec_info_type; + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "=== vectorizable_simd_clone_call ===\n"); +/* vect_model_simple_cost (stmt_info, ncopies, dt, NULL, NULL); */ + arginfo.release (); + return true; + } + + /** Transform. **/ + + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, "transform call.\n"); + + /* Handle def. */ + scalar_dest = gimple_call_lhs (stmt); + vec_dest = NULL_TREE; + rtype = NULL_TREE; + ratype = NULL_TREE; + if (scalar_dest) + { + vec_dest = vect_create_destination_var (scalar_dest, vectype); + rtype = TREE_TYPE (TREE_TYPE (fndecl)); + if (TREE_CODE (rtype) == ARRAY_TYPE) + { + ratype = rtype; + rtype = TREE_TYPE (ratype); + } + } + + prev_stmt_info = NULL; + for (j = 0; j < ncopies; ++j) + { + /* Build argument list for the vectorized call. */ + if (j == 0) + vargs.create (nargs); + else + vargs.truncate (0); + + for (i = 0; i < nargs; i++) + { + unsigned int k, l, m, o; + tree atype; + op = gimple_call_arg (stmt, i); + switch (bestn->simdclone->args[i].arg_type) + { + case SIMD_CLONE_ARG_TYPE_VECTOR: + atype = bestn->simdclone->args[i].vector_type; + o = nunits / TYPE_VECTOR_SUBPARTS (atype); + for (m = j * o; m < (j + 1) * o; m++) + { + if (TYPE_VECTOR_SUBPARTS (atype) + < TYPE_VECTOR_SUBPARTS (arginfo[i].vectype)) + { + unsigned int prec = GET_MODE_BITSIZE (TYPE_MODE (atype)); + k = (TYPE_VECTOR_SUBPARTS (arginfo[i].vectype) + / TYPE_VECTOR_SUBPARTS (atype)); + gcc_assert ((k & (k - 1)) == 0); + if (m == 0) + vec_oprnd0 + = vect_get_vec_def_for_operand (op, stmt, NULL); + else + { + vec_oprnd0 = arginfo[i].op; + if ((m & (k - 1)) == 0) + vec_oprnd0 + = vect_get_vec_def_for_stmt_copy (arginfo[i].dt, + vec_oprnd0); + } + arginfo[i].op = vec_oprnd0; + vec_oprnd0 + = build3 (BIT_FIELD_REF, atype, vec_oprnd0, + build_int_cst (integer_type_node, prec), + build_int_cst (integer_type_node, + (m & (k - 1)) * prec)); + new_stmt + = gimple_build_assign_with_ops (BIT_FIELD_REF, + make_ssa_name (atype, + NULL), + vec_oprnd0, NULL_TREE); + vect_finish_stmt_generation (stmt, new_stmt, gsi); + vargs.safe_push (gimple_assign_lhs (new_stmt)); + } + else + { + k = (TYPE_VECTOR_SUBPARTS (atype) + / TYPE_VECTOR_SUBPARTS (arginfo[i].vectype)); + gcc_assert ((k & (k - 1)) == 0); + vec *ctor_elts; + if (k != 1) + vec_alloc (ctor_elts, k); + else + ctor_elts = NULL; + for (l = 0; l < k; l++) + { + if (m == 0 && l == 0) + vec_oprnd0 + = vect_get_vec_def_for_operand (op, stmt, NULL); + else + vec_oprnd0 + = vect_get_vec_def_for_stmt_copy (arginfo[i].dt, + arginfo[i].op); + arginfo[i].op = vec_oprnd0; + if (k == 1) + break; + CONSTRUCTOR_APPEND_ELT (ctor_elts, NULL_TREE, + vec_oprnd0); + } + if (k == 1) + vargs.safe_push (vec_oprnd0); + else + { + vec_oprnd0 = build_constructor (atype, ctor_elts); + new_stmt + = gimple_build_assign_with_ops + (CONSTRUCTOR, make_ssa_name (atype, NULL), + vec_oprnd0, NULL_TREE); + vect_finish_stmt_generation (stmt, new_stmt, gsi); + vargs.safe_push (gimple_assign_lhs (new_stmt)); + } + } + } + break; + case SIMD_CLONE_ARG_TYPE_UNIFORM: + vargs.safe_push (op); + break; + case SIMD_CLONE_ARG_TYPE_LINEAR_CONSTANT_STEP: + if (j == 0) + { + gimple_seq stmts; + arginfo[i].op + = force_gimple_operand (arginfo[i].op, &stmts, true, + NULL_TREE); + if (stmts != NULL) + { + basic_block new_bb; + edge pe = loop_preheader_edge (loop); + new_bb = gsi_insert_seq_on_edge_immediate (pe, stmts); + gcc_assert (!new_bb); + } + tree phi_res = copy_ssa_name (op, NULL); + gimple new_phi = create_phi_node (phi_res, loop->header); + set_vinfo_for_stmt (new_phi, + new_stmt_vec_info (new_phi, loop_vinfo, + NULL)); + add_phi_arg (new_phi, arginfo[i].op, + loop_preheader_edge (loop), UNKNOWN_LOCATION); + enum tree_code code + = POINTER_TYPE_P (TREE_TYPE (op)) + ? POINTER_PLUS_EXPR : PLUS_EXPR; + tree type = POINTER_TYPE_P (TREE_TYPE (op)) + ? sizetype : TREE_TYPE (op); + double_int cst + = double_int::from_shwi (arginfo[i].linear_step); + cst *= double_int::from_uhwi (ncopies * nunits); + tree tcst = double_int_to_tree (type, cst); + tree phi_arg = copy_ssa_name (op, NULL); + new_stmt = gimple_build_assign_with_ops (code, phi_arg, + phi_res, tcst); + gimple_stmt_iterator si = gsi_after_labels (loop->header); + gsi_insert_after (&si, new_stmt, GSI_NEW_STMT); + set_vinfo_for_stmt (new_stmt, + new_stmt_vec_info (new_stmt, loop_vinfo, + NULL)); + add_phi_arg (new_phi, phi_arg, loop_latch_edge (loop), + UNKNOWN_LOCATION); + arginfo[i].op = phi_res; + vargs.safe_push (phi_res); + } + else + { + enum tree_code code + = POINTER_TYPE_P (TREE_TYPE (op)) + ? POINTER_PLUS_EXPR : PLUS_EXPR; + tree type = POINTER_TYPE_P (TREE_TYPE (op)) + ? sizetype : TREE_TYPE (op); + double_int cst + = double_int::from_shwi (arginfo[i].linear_step); + cst *= double_int::from_uhwi (j * nunits); + tree tcst = double_int_to_tree (type, cst); + new_temp = make_ssa_name (TREE_TYPE (op), NULL); + new_stmt + = gimple_build_assign_with_ops (code, new_temp, + arginfo[i].op, tcst); + vect_finish_stmt_generation (stmt, new_stmt, gsi); + vargs.safe_push (new_temp); + } + break; + case SIMD_CLONE_ARG_TYPE_LINEAR_VARIABLE_STEP: + default: + gcc_unreachable (); + } + } + + new_stmt = gimple_build_call_vec (fndecl, vargs); + if (vec_dest) + { + gcc_assert (ratype || TYPE_VECTOR_SUBPARTS (rtype) == nunits); + if (ratype) + new_temp = create_tmp_var (ratype, NULL); + else if (TYPE_VECTOR_SUBPARTS (vectype) + == TYPE_VECTOR_SUBPARTS (rtype)) + new_temp = make_ssa_name (vec_dest, new_stmt); + else + new_temp = make_ssa_name (rtype, new_stmt); + gimple_call_set_lhs (new_stmt, new_temp); + } + vect_finish_stmt_generation (stmt, new_stmt, gsi); + + if (vec_dest) + { + if (TYPE_VECTOR_SUBPARTS (vectype) < nunits) + { + unsigned int k, l; + unsigned int prec = GET_MODE_BITSIZE (TYPE_MODE (vectype)); + k = nunits / TYPE_VECTOR_SUBPARTS (vectype); + gcc_assert ((k & (k - 1)) == 0); + for (l = 0; l < k; l++) + { + tree t; + if (ratype) + { + t = build_fold_addr_expr (new_temp); + t = build2 (MEM_REF, vectype, t, + build_int_cst (TREE_TYPE (t), + l * prec / BITS_PER_UNIT)); + } + else + t = build3 (BIT_FIELD_REF, vectype, new_temp, + build_int_cst (integer_type_node, prec), + build_int_cst (integer_type_node, l * prec)); + new_stmt + = gimple_build_assign_with_ops (TREE_CODE (t), + make_ssa_name (vectype, + NULL), + t, NULL_TREE); + vect_finish_stmt_generation (stmt, new_stmt, gsi); + if (j == 0 && l == 0) + STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt; + else + STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt; + + prev_stmt_info = vinfo_for_stmt (new_stmt); + } + + if (ratype) + { + tree clobber = build_constructor (ratype, NULL); + TREE_THIS_VOLATILE (clobber) = 1; + new_stmt = gimple_build_assign (new_temp, clobber); + vect_finish_stmt_generation (stmt, new_stmt, gsi); + } + continue; + } + else if (TYPE_VECTOR_SUBPARTS (vectype) > nunits) + { + unsigned int k = (TYPE_VECTOR_SUBPARTS (vectype) + / TYPE_VECTOR_SUBPARTS (rtype)); + gcc_assert ((k & (k - 1)) == 0); + if ((j & (k - 1)) == 0) + vec_alloc (ret_ctor_elts, k); + if (ratype) + { + unsigned int m, o = nunits / TYPE_VECTOR_SUBPARTS (rtype); + for (m = 0; m < o; m++) + { + tree tem = build4 (ARRAY_REF, rtype, new_temp, + size_int (m), NULL_TREE, NULL_TREE); + new_stmt + = gimple_build_assign_with_ops (ARRAY_REF, rtype, + make_ssa_name (rtype, + NULL), + tem); + vect_finish_stmt_generation (stmt, new_stmt, gsi); + CONSTRUCTOR_APPEND_ELT (ret_ctor_elts, NULL_TREE, tem); + } + tree clobber = build_constructor (ratype, NULL); + TREE_THIS_VOLATILE (clobber) = 1; + new_stmt = gimple_build_assign (new_temp, clobber); + vect_finish_stmt_generation (stmt, new_stmt, gsi); + } + else + CONSTRUCTOR_APPEND_ELT (ret_ctor_elts, NULL_TREE, new_temp); + if ((j & (k - 1)) != k - 1) + continue; + vec_oprnd0 = build_constructor (vectype, ret_ctor_elts); + new_stmt + = gimple_build_assign_with_ops (CONSTRUCTOR, + make_ssa_name (vec_dest, NULL), + vec_oprnd0, NULL_TREE); + vect_finish_stmt_generation (stmt, new_stmt, gsi); + + if ((unsigned) j == k - 1) + STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt; + else + STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt; + + prev_stmt_info = vinfo_for_stmt (new_stmt); + continue; + } + else if (ratype) + { + tree t = build_fold_addr_expr (new_temp); + t = build2 (MEM_REF, vectype, t, + build_int_cst (TREE_TYPE (t), 0)); + new_stmt + = gimple_build_assign_with_ops (MEM_REF, vectype, + make_ssa_name (vec_dest, + NULL), t); + vect_finish_stmt_generation (stmt, new_stmt, gsi); + tree clobber = build_constructor (ratype, NULL); + TREE_THIS_VOLATILE (clobber) = 1; + vect_finish_stmt_generation (stmt, + gimple_build_assign (new_temp, + clobber), gsi); + } + } + + if (j == 0) + STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt; + else + STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt; + + prev_stmt_info = vinfo_for_stmt (new_stmt); + } + + vargs.release (); + + /* Update the exception handling table with the vector stmt if necessary. */ + if (maybe_clean_or_replace_eh_stmt (stmt, *vec_stmt)) + gimple_purge_dead_eh_edges (gimple_bb (stmt)); + + /* The call in STMT might prevent it from being removed in dce. + We however cannot remove it here, due to the way the ssa name + it defines is mapped to the new definition. So just replace + rhs of the statement with something harmless. */ + + if (slp_node) + return true; + + if (scalar_dest) + { + type = TREE_TYPE (scalar_dest); + if (is_pattern_stmt_p (stmt_info)) + lhs = gimple_call_lhs (STMT_VINFO_RELATED_STMT (stmt_info)); + else + lhs = gimple_call_lhs (stmt); + new_stmt = gimple_build_assign (lhs, build_zero_cst (type)); + } + else + new_stmt = gimple_build_nop (); + set_vinfo_for_stmt (new_stmt, stmt_info); + set_vinfo_for_stmt (stmt, NULL); + STMT_VINFO_STMT (stmt_info) = new_stmt; + gsi_replace (gsi, new_stmt, false); + unlink_stmt_vdef (stmt); + + return true; +} + + /* Function vect_gen_widened_results_half Create a vector stmt whose code, type, number of arguments, and result @@ -5838,6 +6439,7 @@ vect_analyze_stmt (gimple stmt, bool *ne || vectorizable_assignment (stmt, NULL, NULL, NULL) || vectorizable_load (stmt, NULL, NULL, NULL, NULL) || vectorizable_call (stmt, NULL, NULL, NULL) + || vectorizable_simd_clone_call (stmt, NULL, NULL, NULL) || vectorizable_store (stmt, NULL, NULL, NULL) || vectorizable_reduction (stmt, NULL, NULL, NULL) || vectorizable_condition (stmt, NULL, NULL, NULL, 0, NULL)); @@ -5850,6 +6452,7 @@ vect_analyze_stmt (gimple stmt, bool *ne || vectorizable_assignment (stmt, NULL, NULL, node) || vectorizable_load (stmt, NULL, NULL, node, NULL) || vectorizable_call (stmt, NULL, NULL, node) + || vectorizable_simd_clone_call (stmt, NULL, NULL, node) || vectorizable_store (stmt, NULL, NULL, node) || vectorizable_condition (stmt, NULL, NULL, NULL, 0, node)); } @@ -5972,6 +6575,11 @@ vect_transform_stmt (gimple stmt, gimple stmt = gsi_stmt (*gsi); break; + case call_simd_clone_vec_info_type: + done = vectorizable_simd_clone_call (stmt, gsi, &vec_stmt, slp_node); + stmt = gsi_stmt (*gsi); + break; + case reduc_vec_info_type: done = vectorizable_reduction (stmt, gsi, &vec_stmt, slp_node); gcc_assert (done); --- gcc/c/c-decl.c (.../trunk) (revision 205223) +++ gcc/c/c-decl.c (.../branches/gomp-4_0-branch) (revision 205231) @@ -3646,8 +3646,9 @@ c_builtin_function_ext_scope (tree decl) const char *name = IDENTIFIER_POINTER (id); C_DECL_BUILTIN_PROTOTYPE (decl) = prototype_p (type); - bind (id, decl, external_scope, /*invisible=*/false, /*nested=*/false, - UNKNOWN_LOCATION); + if (external_scope) + bind (id, decl, external_scope, /*invisible=*/false, /*nested=*/false, + UNKNOWN_LOCATION); /* Builtins in the implementation namespace are made visible without needing to be explicitly declared. See push_file_scope. */ --- gcc/cp/semantics.c (.../trunk) (revision 205223) +++ gcc/cp/semantics.c (.../branches/gomp-4_0-branch) (revision 205231) @@ -5214,6 +5214,8 @@ finish_omp_clauses (tree clauses) t = mark_rvalue_use (t); if (!processing_template_decl) { + if (TREE_CODE (OMP_CLAUSE_DECL (c)) == PARM_DECL) + t = maybe_constant_value (t); t = fold_build_cleanup_point_expr (TREE_TYPE (t), t); if (TREE_CODE (TREE_TYPE (OMP_CLAUSE_DECL (c))) == POINTER_TYPE) --- gcc/testsuite/gcc.dg/gomp/simd-clones-1.c (.../trunk) (revision 0) +++ gcc/testsuite/gcc.dg/gomp/simd-clones-1.c (.../branches/gomp-4_0-branch) (revision 205231) @@ -0,0 +1,33 @@ +/* { dg-do compile } */ +/* { dg-options "-fopenmp -fdump-tree-optimized -O3" } */ + +/* Test that functions that have SIMD clone counterparts are not + cloned by IPA-cp. For example, special_add() below has SIMD clones + created for it. However, if IPA-cp later decides to clone a + specialization of special_add(x, 666) when analyzing fillit(), we + will forever keep the vectorizer from using the SIMD versions of + special_add in a loop. + + If IPA-CP gets taught how to adjust the SIMD clones as well, this + test could be removed. */ + +#pragma omp declare simd simdlen(4) +static int __attribute__ ((noinline)) +special_add (int x, int y) +{ + if (y == 666) + return x + y + 123; + else + return x + y; +} + +void fillit(int *tot) +{ + int i; + + for (i=0; i < 10000; ++i) + tot[i] = special_add (i, 666); +} + +/* { dg-final { scan-tree-dump-not "special_add.constprop" "optimized" } } */ +/* { dg-final { cleanup-tree-dump "optimized" } } */ --- gcc/testsuite/gcc.dg/gomp/simd-clones-2.c (.../trunk) (revision 0) +++ gcc/testsuite/gcc.dg/gomp/simd-clones-2.c (.../branches/gomp-4_0-branch) (revision 205231) @@ -0,0 +1,26 @@ +/* { dg-options "-fopenmp -fdump-tree-optimized -O" } */ + +#pragma omp declare simd inbranch uniform(c) linear(b:66) +#pragma omp declare simd notinbranch aligned(c:32) +int addit(int a, int b, int *c) +{ + return a + b; +} + +#pragma omp declare simd uniform(a) aligned(a:32) linear(k:1) notinbranch +float setArray(float *a, float x, int k) +{ + a[k] = a[k] + x; + return a[k]; +} + +/* { dg-final { scan-tree-dump "_ZGVbN4ua32vl_setArray" "optimized" { target i?86-*-* x86_64-*-* } } } */ +/* { dg-final { scan-tree-dump "_ZGVbN4vvva32_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */ +/* { dg-final { scan-tree-dump "_ZGVbM4vl66u_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */ +/* { dg-final { scan-tree-dump "_ZGVcN8ua32vl_setArray" "optimized" { target i?86-*-* x86_64-*-* } } } */ +/* { dg-final { scan-tree-dump "_ZGVcN4vvva32_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */ +/* { dg-final { scan-tree-dump "_ZGVcM4vl66u_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */ +/* { dg-final { scan-tree-dump "_ZGVdN8ua32vl_setArray" "optimized" { target i?86-*-* x86_64-*-* } } } */ +/* { dg-final { scan-tree-dump "_ZGVdN8vvva32_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */ +/* { dg-final { scan-tree-dump "_ZGVdM8vl66u_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */ +/* { dg-final { cleanup-tree-dump "optimized" } } */ --- gcc/testsuite/gcc.dg/gomp/simd-clones-3.c (.../trunk) (revision 0) +++ gcc/testsuite/gcc.dg/gomp/simd-clones-3.c (.../branches/gomp-4_0-branch) (revision 205231) @@ -0,0 +1,18 @@ +/* { dg-options "-fopenmp -fdump-tree-optimized -O2" } */ + +/* Test that if there is no *inbranch clauses, that both the masked and + the unmasked version are created. */ + +#pragma omp declare simd +int addit(int a, int b, int c) +{ + return a + b; +} + +/* { dg-final { scan-tree-dump "_ZGVbN4vvv_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */ +/* { dg-final { scan-tree-dump "_ZGVbM4vvv_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */ +/* { dg-final { scan-tree-dump "_ZGVcN4vvv_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */ +/* { dg-final { scan-tree-dump "_ZGVcM4vvv_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */ +/* { dg-final { scan-tree-dump "_ZGVdN8vvv_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */ +/* { dg-final { scan-tree-dump "_ZGVdM8vvv_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */ +/* { dg-final { cleanup-tree-dump "optimized" } } */ --- gcc/testsuite/gcc.dg/gomp/simd-clones-4.c (.../trunk) (revision 0) +++ gcc/testsuite/gcc.dg/gomp/simd-clones-4.c (.../branches/gomp-4_0-branch) (revision 205231) @@ -0,0 +1,11 @@ +/* { dg-do compile { target i?86-*-* x86_64-*-* } } */ +/* { dg-options "-fopenmp" } */ + +#pragma omp declare simd simdlen(4) notinbranch +int f2 (int a, int b) +{ + if (a > 5) + return a + b; + else + return a - b; +} --- gcc/testsuite/gcc.dg/gomp/simd-clones-5.c (.../trunk) (revision 0) +++ gcc/testsuite/gcc.dg/gomp/simd-clones-5.c (.../branches/gomp-4_0-branch) (revision 205231) @@ -0,0 +1,12 @@ +/* { dg-do compile { target i?86-*-* x86_64-*-* } } */ +/* { dg-options "-fopenmp -w" } */ + +/* ?? The -w above is to inhibit the following warning for now: + a.c:2:6: warning: AVX vector argument without AVX enabled changes + the ABI [enabled by default]. */ + +#pragma omp declare simd notinbranch simdlen(4) +void foo (int *a) +{ + *a = 555; +} --- gcc/testsuite/gcc.dg/gomp/simd-clones-6.c (.../trunk) (revision 0) +++ gcc/testsuite/gcc.dg/gomp/simd-clones-6.c (.../branches/gomp-4_0-branch) (revision 205231) @@ -0,0 +1,11 @@ +/* { dg-do compile { target i?86-*-* x86_64-*-* } } */ +/* { dg-options "-fopenmp" } */ + +/* Test that array subscripts are properly adjusted. */ + +int array[1000]; +#pragma omp declare simd notinbranch simdlen(4) +void foo (int i) +{ + array[i] = 555; +} --- gcc/testsuite/gcc.dg/gomp/simd-clones-7.c (.../trunk) (revision 0) +++ gcc/testsuite/gcc.dg/gomp/simd-clones-7.c (.../branches/gomp-4_0-branch) (revision 205231) @@ -0,0 +1,16 @@ +/* { dg-do compile { target i?86-*-* x86_64-*-* } } */ +/* { dg-options "-fopenmp -w" } */ + +int array[1000]; + +#pragma omp declare simd notinbranch simdlen(4) +void foo (int *a, int b) +{ + a[b] = 555; +} + +#pragma omp declare simd notinbranch simdlen(4) +void bar (int *a) +{ + *a = 555; +} --- gcc/testsuite/gcc.dg/vect/vect-simd-clone-10a.c (.../trunk) (revision 0) +++ gcc/testsuite/gcc.dg/vect/vect-simd-clone-10a.c (.../branches/gomp-4_0-branch) (revision 205231) @@ -0,0 +1,17 @@ +/* { dg-do compile } */ + +#include "vect-simd-clone-10.h" + +#pragma omp declare simd notinbranch +extern int +foo (long int a, int b, int c) +{ + return a + b + c; +} + +#pragma omp declare simd notinbranch +extern long int +bar (int a, int b, long int c) +{ + return a + b + c; +} --- gcc/testsuite/gcc.dg/vect/vect-simd-clone-10.c (.../trunk) (revision 0) +++ gcc/testsuite/gcc.dg/vect/vect-simd-clone-10.c (.../branches/gomp-4_0-branch) (revision 205231) @@ -0,0 +1,83 @@ +/* { dg-additional-options "-fopenmp-simd" } */ +/* { dg-additional-options "-mavx" { target avx_runtime } } */ +/* { dg-additional-sources vect-simd-clone-10a.c } */ + +#include "tree-vect.h" + +#ifndef N +#define N 1024 +#endif + +int a[N], b[N]; +long int c[N]; +unsigned char d[N]; + +#include "vect-simd-clone-10.h" + +__attribute__((noinline)) void +fn1 (void) +{ + int i; + #pragma omp simd + for (i = 0; i < N; i++) + a[i] = foo (c[i], a[i], b[i]) + 6; + #pragma omp simd + for (i = 0; i < N; i++) + c[i] = bar (a[i], b[i], c[i]) * 2; +} + +__attribute__((noinline)) void +fn2 (void) +{ + int i; + #pragma omp simd + for (i = 0; i < N; i++) + { + a[i] = foo (c[i], a[i], b[i]) + 6; + d[i]++; + } + #pragma omp simd + for (i = 0; i < N; i++) + { + c[i] = bar (a[i], b[i], c[i]) * 2; + d[i] /= 2; + } +} + +__attribute__((noinline)) void +fn3 (void) +{ + int i; + for (i = 0; i < N; i++) + { + a[i] = i * 2; + b[i] = 17 + (i % 37); + c[i] = (i & 63); + d[i] = 16 + i; + } +} + +int +main () +{ + int i; + check_vect (); + fn3 (); + fn1 (); + for (i = 0; i < N; i++) + if (a[i] != i * 2 + 23 + (i % 37) + (i & 63) + || b[i] != 17 + (i % 37) + || c[i] != i * 4 + 80 + 4 * (i % 37) + 4 * (i & 63)) + abort (); + fn3 (); + fn2 (); + for (i = 0; i < N; i++) + if (a[i] != i * 2 + 23 + (i % 37) + (i & 63) + || b[i] != 17 + (i % 37) + || c[i] != i * 4 + 80 + 4 * (i % 37) + 4 * (i & 63) + || d[i] != ((unsigned char) (17 + i)) / 2) + abort (); + return 0; +} + +/* { dg-final { cleanup-tree-dump "vect" } } */ --- gcc/testsuite/gcc.dg/vect/vect-simd-clone-10.h (.../trunk) (revision 0) +++ gcc/testsuite/gcc.dg/vect/vect-simd-clone-10.h (.../branches/gomp-4_0-branch) (revision 205231) @@ -0,0 +1,4 @@ +#pragma omp declare simd notinbranch +extern int foo (long int a, int b, int c); +#pragma omp declare simd notinbranch +extern long int bar (int a, int b, long int c); --- gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c (.../trunk) (revision 0) +++ gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c (.../branches/gomp-4_0-branch) (revision 205231) @@ -0,0 +1,58 @@ +/* { dg-additional-options "-fopenmp-simd" } */ +/* { dg-additional-options "-mavx" { target avx_runtime } } */ + +#include "tree-vect.h" + +#ifndef N +#define N 1024 +#endif + +int array[N]; + +#pragma omp declare simd simdlen(4) notinbranch +#pragma omp declare simd simdlen(4) notinbranch uniform(b) linear(c:3) +#pragma omp declare simd simdlen(8) notinbranch +#pragma omp declare simd simdlen(8) notinbranch uniform(b) linear(c:3) +__attribute__((noinline)) int +foo (int a, int b, int c) +{ + if (a < 30) + return 5; + return a + b + c; +} + +__attribute__((noinline, noclone)) void +bar () +{ + int i; +#pragma omp simd + for (i = 0; i < N; ++i) + array[i] = foo (i, 123, i * 3); +} + +__attribute__((noinline, noclone)) void +baz () +{ + int i; +#pragma omp simd + for (i = 0; i < N; ++i) + array[i] = foo (i, array[i], i * 3); +} + +int +main () +{ + int i; + check_vect (); + bar (); + for (i = 0; i < N; i++) + if (array[i] != (i < 30 ? 5 : i * 4 + 123)) + abort (); + baz (); + for (i = 0; i < N; i++) + if (array[i] != (i < 30 ? 5 : i * 8 + 123)) + abort (); + return 0; +} + +/* { dg-final { cleanup-tree-dump "vect" } } */ --- gcc/testsuite/gcc.dg/vect/vect-simd-clone-2.c (.../trunk) (revision 0) +++ gcc/testsuite/gcc.dg/vect/vect-simd-clone-2.c (.../branches/gomp-4_0-branch) (revision 205231) @@ -0,0 +1,52 @@ +/* { dg-additional-options "-fopenmp-simd" } */ +/* { dg-additional-options "-mavx" { target avx_runtime } } */ + +#include "tree-vect.h" + +#ifndef N +#define N 1024 +#endif + +int array[N] __attribute__((aligned (32))); + +#pragma omp declare simd simdlen(4) notinbranch aligned(a:16) uniform(a) linear(b) +#pragma omp declare simd simdlen(4) notinbranch aligned(a:32) uniform(a) linear(b) +#pragma omp declare simd simdlen(8) notinbranch aligned(a:16) uniform(a) linear(b) +#pragma omp declare simd simdlen(8) notinbranch aligned(a:32) uniform(a) linear(b) +__attribute__((noinline)) void +foo (int *a, int b, int c) +{ + a[b] = c; +} + +__attribute__((noinline, noclone)) void +bar () +{ + int i; +#pragma omp simd + for (i = 0; i < N; ++i) + foo (array, i, i * array[i]); +} + +__attribute__((noinline, noclone)) void +baz () +{ + int i; + for (i = 0; i < N; i++) + array[i] = 5 * (i & 7); +} + +int +main () +{ + int i; + check_vect (); + baz (); + bar (); + for (i = 0; i < N; i++) + if (array[i] != 5 * (i & 7) * i) + abort (); + return 0; +} + +/* { dg-final { cleanup-tree-dump "vect" } } */ --- gcc/testsuite/gcc.dg/vect/vect-simd-clone-3.c (.../trunk) (revision 0) +++ gcc/testsuite/gcc.dg/vect/vect-simd-clone-3.c (.../branches/gomp-4_0-branch) (revision 205231) @@ -0,0 +1,45 @@ +/* { dg-additional-options "-fopenmp-simd" } */ +/* { dg-additional-options "-mavx" { target avx_runtime } } */ + +#include "tree-vect.h" + +#ifndef N +#define N 1024 +#endif + +int d[N], e[N]; + +#pragma omp declare simd simdlen(4) notinbranch uniform(b) linear(c:3) +__attribute__((noinline)) int +foo (int a, int b, int c) +{ + if (a < 30) + return 5; + return a + b + c; +} + +__attribute__((noinline, noclone)) void +bar () +{ + int i; +#pragma omp simd + for (i = 0; i < N; ++i) + { + d[i] = foo (i, 123, i * 3); + e[i] = e[i] + i; + } +} + +int +main () +{ + int i; + check_vect (); + bar (); + for (i = 0; i < N; i++) + if (d[i] != (i < 30 ? 5 : i * 4 + 123) || e[i] != i) + abort (); + return 0; +} + +/* { dg-final { cleanup-tree-dump "vect" } } */ --- gcc/testsuite/gcc.dg/vect/vect-simd-clone-4.c (.../trunk) (revision 0) +++ gcc/testsuite/gcc.dg/vect/vect-simd-clone-4.c (.../branches/gomp-4_0-branch) (revision 205231) @@ -0,0 +1,48 @@ +/* { dg-additional-options "-fopenmp-simd" } */ +/* { dg-additional-options "-mavx" { target avx_runtime } } */ + +#include "tree-vect.h" + +#ifndef N +#define N 1024 +#endif + +float d[N]; +int e[N]; +unsigned short f[N]; + +#pragma omp declare simd simdlen(8) notinbranch uniform(b) +__attribute__((noinline)) float +foo (float a, float b, float c) +{ + if (a < 30) + return 5.0f; + return a + b + c; +} + +__attribute__((noinline, noclone)) void +bar () +{ + int i; +#pragma omp simd + for (i = 0; i < N; ++i) + { + d[i] = foo (i, 123, i * 3); + e[i] = e[i] * 3; + f[i] = f[i] + 1; + } +} + +int +main () +{ + int i; + check_vect (); + bar (); + for (i = 0; i < N; i++) + if (d[i] != (i < 30 ? 5.0f : i * 4 + 123.0f) || e[i] || f[i] != 1) + abort (); + return 0; +} + +/* { dg-final { cleanup-tree-dump "vect" } } */ --- gcc/testsuite/gcc.dg/vect/vect-simd-clone-5.c (.../trunk) (revision 0) +++ gcc/testsuite/gcc.dg/vect/vect-simd-clone-5.c (.../branches/gomp-4_0-branch) (revision 205231) @@ -0,0 +1,43 @@ +/* { dg-additional-options "-fopenmp-simd" } */ +/* { dg-additional-options "-mavx" { target avx_runtime } } */ + +#include "tree-vect.h" + +#ifndef N +#define N 1024 +#endif + +int d[N], e[N]; + +#pragma omp declare simd simdlen(4) notinbranch uniform(b) linear(c:3) +__attribute__((noinline)) long long int +foo (int a, int b, int c) +{ + return a + b + c; +} + +__attribute__((noinline, noclone)) void +bar () +{ + int i; +#pragma omp simd + for (i = 0; i < N; ++i) + { + d[i] = foo (i, 123, i * 3); + e[i] = e[i] + i; + } +} + +int +main () +{ + int i; + check_vect (); + bar (); + for (i = 0; i < N; i++) + if (d[i] != i * 4 + 123 || e[i] != i) + abort (); + return 0; +} + +/* { dg-final { cleanup-tree-dump "vect" } } */ --- gcc/testsuite/gcc.dg/vect/vect-simd-clone-6.c (.../trunk) (revision 0) +++ gcc/testsuite/gcc.dg/vect/vect-simd-clone-6.c (.../branches/gomp-4_0-branch) (revision 205231) @@ -0,0 +1,74 @@ +/* { dg-additional-options "-fopenmp-simd" } */ +/* { dg-additional-options "-mavx" { target avx_runtime } } */ + +#include "tree-vect.h" + +#ifndef N +#define N 1024 +#endif + +int a[N]; +long long int b[N]; +short c[N]; + +#pragma omp declare simd +#pragma omp declare simd uniform(b) linear(c:3) +__attribute__((noinline)) short +foo (int a, long long int b, short c) +{ + return a + b + c; +} + +__attribute__((noinline, noclone)) void +bar (int x) +{ + int i; + if (x == 0) + { + #pragma omp simd + for (i = 0; i < N; i++) + c[i] = foo (a[i], b[i], c[i]); + } + else + { + #pragma omp simd + for (i = 0; i < N; i++) + c[i] = foo (a[i], x, i * 3); + } +} + +__attribute__((noinline, noclone)) void +baz (void) +{ + int i; + for (i = 0; i < N; i++) + { + a[i] = 2 * i; + b[i] = -7 * i + 6; + c[i] = (i & 31) << 4; + } +} + +int +main () +{ + int i; + check_vect (); + baz (); + bar (0); + for (i = 0; i < N; i++) + if (a[i] != 2 * i || b[i] != 6 - 7 * i + || c[i] != 6 - 5 * i + ((i & 31) << 4)) + abort (); + else + a[i] = c[i]; + bar (17); + for (i = 0; i < N; i++) + if (a[i] != 6 - 5 * i + ((i & 31) << 4) + || b[i] != 6 - 7 * i + || c[i] != 23 - 2 * i + ((i & 31) << 4)) + abort (); + return 0; +} + +/* { dg-final { cleanup-tree-dump "vect" } } */ --- gcc/testsuite/gcc.dg/vect/vect-simd-clone-7.c (.../trunk) (revision 0) +++ gcc/testsuite/gcc.dg/vect/vect-simd-clone-7.c (.../branches/gomp-4_0-branch) (revision 205231) @@ -0,0 +1,74 @@ +/* { dg-additional-options "-fopenmp-simd" } */ +/* { dg-additional-options "-mavx" { target avx_runtime } } */ + +#include "tree-vect.h" + +#ifndef N +#define N 1024 +#endif + +int a[N]; +long long int b[N]; +short c[N]; + +#pragma omp declare simd +#pragma omp declare simd uniform(b) linear(c:3) +__attribute__((noinline)) short +foo (int a, long long int b, int c) +{ + return a + b + c; +} + +__attribute__((noinline, noclone)) void +bar (int x) +{ + int i; + if (x == 0) + { + #pragma omp simd + for (i = 0; i < N; i++) + c[i] = foo (a[i], b[i], c[i]); + } + else + { + #pragma omp simd + for (i = 0; i < N; i++) + c[i] = foo (a[i], x, i * 3); + } +} + +__attribute__((noinline, noclone)) void +baz (void) +{ + int i; + for (i = 0; i < N; i++) + { + a[i] = 2 * i; + b[i] = -7 * i + 6; + c[i] = (i & 31) << 4; + } +} + +int +main () +{ + int i; + check_vect (); + baz (); + bar (0); + for (i = 0; i < N; i++) + if (a[i] != 2 * i || b[i] != 6 - 7 * i + || c[i] != 6 - 5 * i + ((i & 31) << 4)) + abort (); + else + a[i] = c[i]; + bar (17); + for (i = 0; i < N; i++) + if (a[i] != 6 - 5 * i + ((i & 31) << 4) + || b[i] != 6 - 7 * i + || c[i] != 23 - 2 * i + ((i & 31) << 4)) + abort (); + return 0; +} + +/* { dg-final { cleanup-tree-dump "vect" } } */ --- gcc/testsuite/gcc.dg/vect/vect-simd-clone-8.c (.../trunk) (revision 0) +++ gcc/testsuite/gcc.dg/vect/vect-simd-clone-8.c (.../branches/gomp-4_0-branch) (revision 205231) @@ -0,0 +1,94 @@ +/* { dg-additional-options "-fopenmp-simd" } */ +/* { dg-additional-options "-mavx" { target avx_runtime } } */ + +#include "tree-vect.h" + +#ifndef N +#define N 1024 +#endif + +int a[N], b[N]; +long int c[N]; +unsigned char d[N]; + +#pragma omp declare simd simdlen(8) notinbranch +__attribute__((noinline)) int +foo (long int a, int b, int c) +{ + return a + b + c; +} + +#pragma omp declare simd simdlen(8) notinbranch +__attribute__((noinline)) long int +bar (int a, int b, long int c) +{ + return a + b + c; +} + +__attribute__((noinline)) void +fn1 (void) +{ + int i; + #pragma omp simd + for (i = 0; i < N; i++) + a[i] = foo (c[i], a[i], b[i]) + 6; + #pragma omp simd + for (i = 0; i < N; i++) + c[i] = bar (a[i], b[i], c[i]) * 2; +} + +__attribute__((noinline)) void +fn2 (void) +{ + int i; + #pragma omp simd + for (i = 0; i < N; i++) + { + a[i] = foo (c[i], a[i], b[i]) + 6; + d[i]++; + } + #pragma omp simd + for (i = 0; i < N; i++) + { + c[i] = bar (a[i], b[i], c[i]) * 2; + d[i] /= 2; + } +} + +__attribute__((noinline)) void +fn3 (void) +{ + int i; + for (i = 0; i < N; i++) + { + a[i] = i * 2; + b[i] = 17 + (i % 37); + c[i] = (i & 63); + d[i] = 16 + i; + } +} + +int +main () +{ + int i; + check_vect (); + fn3 (); + fn1 (); + for (i = 0; i < N; i++) + if (a[i] != i * 2 + 23 + (i % 37) + (i & 63) + || b[i] != 17 + (i % 37) + || c[i] != i * 4 + 80 + 4 * (i % 37) + 4 * (i & 63)) + abort (); + fn3 (); + fn2 (); + for (i = 0; i < N; i++) + if (a[i] != i * 2 + 23 + (i % 37) + (i & 63) + || b[i] != 17 + (i % 37) + || c[i] != i * 4 + 80 + 4 * (i % 37) + 4 * (i & 63) + || d[i] != ((unsigned char) (17 + i)) / 2) + abort (); + return 0; +} + +/* { dg-final { cleanup-tree-dump "vect" } } */ --- gcc/testsuite/gcc.dg/vect/vect-simd-clone-9.c (.../trunk) (revision 0) +++ gcc/testsuite/gcc.dg/vect/vect-simd-clone-9.c (.../branches/gomp-4_0-branch) (revision 205231) @@ -0,0 +1,94 @@ +/* { dg-additional-options "-fopenmp-simd" } */ +/* { dg-additional-options "-mavx" { target avx_runtime } } */ + +#include "tree-vect.h" + +#ifndef N +#define N 1024 +#endif + +int a[N], b[N]; +long int c[N]; +unsigned char d[N]; + +#pragma omp declare simd notinbranch +__attribute__((noinline)) static int +foo (long int a, int b, int c) +{ + return a + b + c; +} + +#pragma omp declare simd notinbranch +__attribute__((noinline)) static long int +bar (int a, int b, long int c) +{ + return a + b + c; +} + +__attribute__((noinline)) void +fn1 (void) +{ + int i; + #pragma omp simd + for (i = 0; i < N; i++) + a[i] = foo (c[i], a[i], b[i]) + 6; + #pragma omp simd + for (i = 0; i < N; i++) + c[i] = bar (a[i], b[i], c[i]) * 2; +} + +__attribute__((noinline)) void +fn2 (void) +{ + int i; + #pragma omp simd + for (i = 0; i < N; i++) + { + a[i] = foo (c[i], a[i], b[i]) + 6; + d[i]++; + } + #pragma omp simd + for (i = 0; i < N; i++) + { + c[i] = bar (a[i], b[i], c[i]) * 2; + d[i] /= 2; + } +} + +__attribute__((noinline)) void +fn3 (void) +{ + int i; + for (i = 0; i < N; i++) + { + a[i] = i * 2; + b[i] = 17 + (i % 37); + c[i] = (i & 63); + d[i] = 16 + i; + } +} + +int +main () +{ + int i; + check_vect (); + fn3 (); + fn1 (); + for (i = 0; i < N; i++) + if (a[i] != i * 2 + 23 + (i % 37) + (i & 63) + || b[i] != 17 + (i % 37) + || c[i] != i * 4 + 80 + 4 * (i % 37) + 4 * (i & 63)) + abort (); + fn3 (); + fn2 (); + for (i = 0; i < N; i++) + if (a[i] != i * 2 + 23 + (i % 37) + (i & 63) + || b[i] != 17 + (i % 37) + || c[i] != i * 4 + 80 + 4 * (i % 37) + 4 * (i & 63) + || d[i] != ((unsigned char) (17 + i)) / 2) + abort (); + return 0; +} + +/* { dg-final { cleanup-tree-dump "vect" } } */ --- gcc/testsuite/g++.dg/gomp/declare-simd-1.C (.../trunk) (revision 205223) +++ gcc/testsuite/g++.dg/gomp/declare-simd-1.C (.../branches/gomp-4_0-branch) (revision 205231) @@ -239,5 +239,5 @@ struct D void f38 (D &d) { - d.f37 <12> (6); + d.f37 <16> (6); }