diff mbox

OpenMP #pragma omp declare simd support

Message ID 20131122003437.GS892@tucnak.redhat.com
State New
Headers show

Commit Message

Jakub Jelinek Nov. 22, 2013, 12:34 a.m. UTC
Hi!

Working virtually out of the Azores now.
Here is the full OpenMP 4.0 elementals (#pragma omp declare simd)
support extracted from the gomp-4_0-branch.  Bootstrapped/regtested on
x86_64-linux and i686-linux, ok for trunk?

2013-11-21  Aldy Hernandez  <aldyh@redhat.com>
	    Jakub Jelinek  <jakub@redhat.com>

	* cgraph.h (enum cgraph_simd_clone_arg_type): New.
	(struct cgraph_simd_clone_arg, struct cgraph_simd_clone): New.
	(struct cgraph_node): Add simdclone and simd_clones fields.
	* config/i386/i386.c (ix86_simd_clone_compute_vecsize_and_simdlen,
	ix86_simd_clone_adjust, ix86_simd_clone_usable): New functions.
	(TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN,
	TARGET_SIMD_CLONE_ADJUST, TARGET_SIMD_CLONE_USABLE): Define.
	* doc/tm.texi.in (TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN,
	TARGET_SIMD_CLONE_ADJUST, TARGET_SIMD_CLONE_USABLE): Add.
	* doc/tm.texi: Regenerated.
	* expr.c (store_constructor): Allow CONSTRUCTOR with VECTOR_TYPE
	(same sized) elements even if the type of the CONSTRUCTOR has
	vector mode and target is a REG.
	* ggc.h (ggc_alloc_cleared_simd_clone_stat): New function.
	* ipa.c (symtab_remove_unreachable_nodes): If node with simd clones
	is kept, keep also the simd clones.
	* ipa-cp.c (determine_versionability): Fail if node->simd_clones or
	node->simdclone is non-NULL.
	(initialize_node_lattices): Set disable = true for simd clones.
	* ipa-prop.c (get_vector_of_formal_parm_types): Renamed to ...
	(ipa_get_vector_of_formal_parm_types): ... this.  No longer static.
	(ipa_modify_formal_parameters): Adjust caller.  Remove
	synth_parm_prefix argument.  Use operator enum instead of bit fields.
	Add assert for properly handling vector of references.  Handle
	creating brand new parameters.
	(ipa_modify_call_arguments): Use operator enum instead of bit
	fields.
	(ipa_combine_adjustments): Same.  Assert that IPA_PARM_OP_NEW is not
	used.
	(ipa_modify_expr, get_ssa_base_param, ipa_get_adjustment_candidate):
	New functions.
	(ipa_dump_param_adjustments): Rename reduction to new_decl.
	Use operator enum instead of bit fields.
	* ipa-prop.h (enum ipa_parm_op): New.
	(struct ipa_parm_adjustment): New fields op, simdlen.  Rename reduction
	to new_decl, new_arg_prefix to arg_prefix and remove remove_param
	and copy_param.
	(ipa_modify_formal_parameters): Remove last argument.
	(ipa_get_vector_of_formal_parm_types, ipa_modify_expr,
	ipa_get_adjustment_candidate): New prototypes.
	* omp-low.c: Include pretty-print.h and ipa-prop.h.
	(simd_clone_vector_of_formal_parm_types): New function.
	(simd_clone_struct_alloc, simd_clone_struct_copy,
	simd_clone_vector_of_formal_parm_types, simd_clone_clauses_extract,
	simd_clone_compute_base_data_type, simd_clone_mangle,
	simd_clone_create, simd_clone_adjust_return_type,
	create_tmp_simd_array, simd_clone_adjust_argument_types,
	simd_clone_init_simd_arrays): New functions.
	(struct modify_stmt_info): New type.
	(ipa_simd_modify_stmt_ops, ipa_simd_modify_function_body,
	simd_clone_adjust, expand_simd_clones, ipa_omp_simd_clone): New
	functions.
	(pass_data_omp_simd_clone): New variable.
	(pass_omp_simd_clone): New class.
	(make_pass_omp_simd_clone): New function.
	* passes.def (pass_omp_simd_clone): New.
	* target.def (TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN,
	TARGET_SIMD_CLONE_ADJUST, TARGET_SIMD_CLONE_USABLE): New target
	hooks.
	* target.h (struct cgraph_node, struct cgraph_simd_node): Declare.
	* tree-core.h (OMP_CLAUSE_LINEAR_VARIABLE_STRIDE): Document.
	* tree.h (OMP_CLAUSE_LINEAR_VARIABLE_STRIDE): Define.
	* tree-pass.h (make_pass_omp_simd_clone): New prototype.
	* tree-sra.c (turn_representatives_into_adjustments): Use operator
	enum.  Set arg_prefix.
	(get_adjustment_for_base): Use operator enum.
	(sra_ipa_modify_expr): Rename to ipa_modify_expr and move to
	ipa-prop.c.
	(sra_ipa_modify_assign): Rename sra_ipa_modify_expr to
	ipa_modify_expr.
	(ipa_sra_modify_function_body): Same.  No longer static.
	(sra_ipa_reset_debug_stmts): Use operator enum.
	(modify_function): Do not pass prefix argument.
	* tree-vect-data-refs.c: Include cgraph.h.
	(vect_analyze_data_refs): Inline by hand find_data_references_in_loop
	and find_data_references_in_bb, if find_data_references_in_stmt
	fails, still allow calls to #pragma omp declare simd functions
	in #pragma omp simd loops unless they contain data references among
	the call arguments or in lhs.
	* tree-vectorizer.h (enum stmt_vec_info_type): Add
	call_simd_clone_vec_info_type.
	* tree-vect-stmts.c: Include tree-ssa-loop.h,
	tree-scalar-evolution.h and cgraph.h.
	(vectorizable_call): Handle calls without lhs.
	(struct simd_call_arg_info): New.
	(vectorizable_simd_clone_call): New function.
	(vect_analyze_stmt, vect_transform_stmt): Call it.
c/
	* c-decl.c (c_builtin_function_ext_scope): Avoid binding if
	external_scope is NULL.
cp/
	* semantics.c (finish_omp_clauses): For #pragma omp declare simd
	linear clause step call maybe_constant_value.
testsuite/
	* g++.dg/gomp/declare-simd-1.C (f38): Make sure
	simdlen is a power of two.
	* gcc.dg/gomp/simd-clones-2.c: Compile on all targets.
	Remove -msse2.  Adjust regexps for name mangling changes.
	* gcc.dg/gomp/simd-clones-3.c: Likewise.
	* gcc.dg/vect/vect-simd-clone-1.c: New test.
	* gcc.dg/vect/vect-simd-clone-2.c: New test.
	* gcc.dg/vect/vect-simd-clone-3.c: New test.
	* gcc.dg/vect/vect-simd-clone-4.c: New test.
	* gcc.dg/vect/vect-simd-clone-5.c: New test.
	* gcc.dg/vect/vect-simd-clone-6.c: New test.
	* gcc.dg/vect/vect-simd-clone-7.c: New test.
	* gcc.dg/vect/vect-simd-clone-8.c: New test.
	* gcc.dg/vect/vect-simd-clone-9.c: New test.
	* gcc.dg/vect/vect-simd-clone-10.c: New test.
	* gcc.dg/vect/vect-simd-clone-10.h: New file.
	* gcc.dg/vect/vect-simd-clone-10a.c: New file.


	Jakub

Comments

Richard Biener Nov. 22, 2013, 10:08 a.m. UTC | #1
On Fri, 22 Nov 2013, Jakub Jelinek wrote:

> Hi!
> 
> Working virtually out of the Azores now.
> Here is the full OpenMP 4.0 elementals (#pragma omp declare simd)
> support extracted from the gomp-4_0-branch.  Bootstrapped/regtested on
> x86_64-linux and i686-linux, ok for trunk?

Comments inline

> 2013-11-21  Aldy Hernandez  <aldyh@redhat.com>
> 	    Jakub Jelinek  <jakub@redhat.com>
> 
> 	* cgraph.h (enum cgraph_simd_clone_arg_type): New.
> 	(struct cgraph_simd_clone_arg, struct cgraph_simd_clone): New.
> 	(struct cgraph_node): Add simdclone and simd_clones fields.
> 	* config/i386/i386.c (ix86_simd_clone_compute_vecsize_and_simdlen,
> 	ix86_simd_clone_adjust, ix86_simd_clone_usable): New functions.
> 	(TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN,
> 	TARGET_SIMD_CLONE_ADJUST, TARGET_SIMD_CLONE_USABLE): Define.
> 	* doc/tm.texi.in (TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN,
> 	TARGET_SIMD_CLONE_ADJUST, TARGET_SIMD_CLONE_USABLE): Add.
> 	* doc/tm.texi: Regenerated.
> 	* expr.c (store_constructor): Allow CONSTRUCTOR with VECTOR_TYPE
> 	(same sized) elements even if the type of the CONSTRUCTOR has
> 	vector mode and target is a REG.
> 	* ggc.h (ggc_alloc_cleared_simd_clone_stat): New function.
> 	* ipa.c (symtab_remove_unreachable_nodes): If node with simd clones
> 	is kept, keep also the simd clones.
> 	* ipa-cp.c (determine_versionability): Fail if node->simd_clones or
> 	node->simdclone is non-NULL.
> 	(initialize_node_lattices): Set disable = true for simd clones.
> 	* ipa-prop.c (get_vector_of_formal_parm_types): Renamed to ...
> 	(ipa_get_vector_of_formal_parm_types): ... this.  No longer static.
> 	(ipa_modify_formal_parameters): Adjust caller.  Remove
> 	synth_parm_prefix argument.  Use operator enum instead of bit fields.
> 	Add assert for properly handling vector of references.  Handle
> 	creating brand new parameters.
> 	(ipa_modify_call_arguments): Use operator enum instead of bit
> 	fields.
> 	(ipa_combine_adjustments): Same.  Assert that IPA_PARM_OP_NEW is not
> 	used.
> 	(ipa_modify_expr, get_ssa_base_param, ipa_get_adjustment_candidate):
> 	New functions.
> 	(ipa_dump_param_adjustments): Rename reduction to new_decl.
> 	Use operator enum instead of bit fields.
> 	* ipa-prop.h (enum ipa_parm_op): New.
> 	(struct ipa_parm_adjustment): New fields op, simdlen.  Rename reduction
> 	to new_decl, new_arg_prefix to arg_prefix and remove remove_param
> 	and copy_param.
> 	(ipa_modify_formal_parameters): Remove last argument.
> 	(ipa_get_vector_of_formal_parm_types, ipa_modify_expr,
> 	ipa_get_adjustment_candidate): New prototypes.
> 	* omp-low.c: Include pretty-print.h and ipa-prop.h.
> 	(simd_clone_vector_of_formal_parm_types): New function.
> 	(simd_clone_struct_alloc, simd_clone_struct_copy,
> 	simd_clone_vector_of_formal_parm_types, simd_clone_clauses_extract,
> 	simd_clone_compute_base_data_type, simd_clone_mangle,
> 	simd_clone_create, simd_clone_adjust_return_type,
> 	create_tmp_simd_array, simd_clone_adjust_argument_types,
> 	simd_clone_init_simd_arrays): New functions.
> 	(struct modify_stmt_info): New type.
> 	(ipa_simd_modify_stmt_ops, ipa_simd_modify_function_body,
> 	simd_clone_adjust, expand_simd_clones, ipa_omp_simd_clone): New
> 	functions.
> 	(pass_data_omp_simd_clone): New variable.
> 	(pass_omp_simd_clone): New class.
> 	(make_pass_omp_simd_clone): New function.
> 	* passes.def (pass_omp_simd_clone): New.
> 	* target.def (TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN,
> 	TARGET_SIMD_CLONE_ADJUST, TARGET_SIMD_CLONE_USABLE): New target
> 	hooks.
> 	* target.h (struct cgraph_node, struct cgraph_simd_node): Declare.
> 	* tree-core.h (OMP_CLAUSE_LINEAR_VARIABLE_STRIDE): Document.
> 	* tree.h (OMP_CLAUSE_LINEAR_VARIABLE_STRIDE): Define.
> 	* tree-pass.h (make_pass_omp_simd_clone): New prototype.
> 	* tree-sra.c (turn_representatives_into_adjustments): Use operator
> 	enum.  Set arg_prefix.
> 	(get_adjustment_for_base): Use operator enum.
> 	(sra_ipa_modify_expr): Rename to ipa_modify_expr and move to
> 	ipa-prop.c.
> 	(sra_ipa_modify_assign): Rename sra_ipa_modify_expr to
> 	ipa_modify_expr.
> 	(ipa_sra_modify_function_body): Same.  No longer static.
> 	(sra_ipa_reset_debug_stmts): Use operator enum.
> 	(modify_function): Do not pass prefix argument.
> 	* tree-vect-data-refs.c: Include cgraph.h.
> 	(vect_analyze_data_refs): Inline by hand find_data_references_in_loop
> 	and find_data_references_in_bb, if find_data_references_in_stmt
> 	fails, still allow calls to #pragma omp declare simd functions
> 	in #pragma omp simd loops unless they contain data references among
> 	the call arguments or in lhs.
> 	* tree-vectorizer.h (enum stmt_vec_info_type): Add
> 	call_simd_clone_vec_info_type.
> 	* tree-vect-stmts.c: Include tree-ssa-loop.h,
> 	tree-scalar-evolution.h and cgraph.h.
> 	(vectorizable_call): Handle calls without lhs.
> 	(struct simd_call_arg_info): New.
> 	(vectorizable_simd_clone_call): New function.
> 	(vect_analyze_stmt, vect_transform_stmt): Call it.
> c/
> 	* c-decl.c (c_builtin_function_ext_scope): Avoid binding if
> 	external_scope is NULL.
> cp/
> 	* semantics.c (finish_omp_clauses): For #pragma omp declare simd
> 	linear clause step call maybe_constant_value.
> testsuite/
> 	* g++.dg/gomp/declare-simd-1.C (f38): Make sure
> 	simdlen is a power of two.
> 	* gcc.dg/gomp/simd-clones-2.c: Compile on all targets.
> 	Remove -msse2.  Adjust regexps for name mangling changes.
> 	* gcc.dg/gomp/simd-clones-3.c: Likewise.
> 	* gcc.dg/vect/vect-simd-clone-1.c: New test.
> 	* gcc.dg/vect/vect-simd-clone-2.c: New test.
> 	* gcc.dg/vect/vect-simd-clone-3.c: New test.
> 	* gcc.dg/vect/vect-simd-clone-4.c: New test.
> 	* gcc.dg/vect/vect-simd-clone-5.c: New test.
> 	* gcc.dg/vect/vect-simd-clone-6.c: New test.
> 	* gcc.dg/vect/vect-simd-clone-7.c: New test.
> 	* gcc.dg/vect/vect-simd-clone-8.c: New test.
> 	* gcc.dg/vect/vect-simd-clone-9.c: New test.
> 	* gcc.dg/vect/vect-simd-clone-10.c: New test.
> 	* gcc.dg/vect/vect-simd-clone-10.h: New file.
> 	* gcc.dg/vect/vect-simd-clone-10a.c: New file.
> 
> --- gcc/cgraph.h	(.../trunk)	(revision 205223)
> +++ gcc/cgraph.h	(.../branches/gomp-4_0-branch)	(revision 205231)
> @@ -256,6 +261,99 @@ struct GTY(()) cgraph_clone_info
>    bitmap combined_args_to_skip;
>  };
>  
> +enum cgraph_simd_clone_arg_type
> +{
> +  SIMD_CLONE_ARG_TYPE_VECTOR,
> +  SIMD_CLONE_ARG_TYPE_UNIFORM,
> +  SIMD_CLONE_ARG_TYPE_LINEAR_CONSTANT_STEP,
> +  SIMD_CLONE_ARG_TYPE_LINEAR_VARIABLE_STEP,
> +  SIMD_CLONE_ARG_TYPE_MASK
> +};
> +
> +/* Function arguments in the original function of a SIMD clone.
> +   Supplementary data for `struct simd_clone'.  */
> +
> +struct GTY(()) cgraph_simd_clone_arg {
> +  /* Original function argument as it originally existed in
> +     DECL_ARGUMENTS.  */
> +  tree orig_arg;
> +
> +  /* orig_arg's function (or for extern functions type from
> +     TYPE_ARG_TYPES).  */
> +  tree orig_type;
> +
> +  /* If argument is a vector, this holds the vector version of
> +     orig_arg that after adjusting the argument types will live in
> +     DECL_ARGUMENTS.  Otherwise, this is NULL.
> +
> +     This basically holds:
> +       vector(simdlen) __typeof__(orig_arg) new_arg.  */
> +  tree vector_arg;
> +
> +  /* vector_arg's type (or for extern functions new vector type.  */
> +  tree vector_type;
> +
> +  /* If argument is a vector, this holds the array where the simd
> +     argument is held while executing the simd clone function.  This
> +     is a local variable in the cloned function.  Its content is
> +     copied from vector_arg upon entry to the clone.
> +
> +     This basically holds:
> +       __typeof__(orig_arg) simd_array[simdlen].  */
> +  tree simd_array;
> +
> +  /* A SIMD clone's argument can be either linear (constant or
> +     variable), uniform, or vector.  */
> +  enum cgraph_simd_clone_arg_type arg_type;
> +
> +  /* For arg_type SIMD_CLONE_ARG_TYPE_LINEAR_CONSTANT_STEP this is
> +     the constant linear step, if arg_type is
> +     SIMD_CLONE_ARG_TYPE_LINEAR_VARIABLE_STEP, this is index of
> +     the uniform argument holding the step, otherwise 0.  */
> +  HOST_WIDE_INT linear_step;
> +
> +  /* Variable alignment if available, otherwise 0.  */
> +  unsigned int alignment;
> +};
> +
> +/* Specific data for a SIMD function clone.  */
> +
> +struct GTY(()) cgraph_simd_clone {
> +  /* Number of words in the SIMD lane associated with this clone.  */
> +  unsigned int simdlen;
> +
> +  /* Number of annotated function arguments in `args'.  This is
> +     usually the number of named arguments in FNDECL.  */
> +  unsigned int nargs;
> +
> +  /* Max hardware vector size in bits for integral vectors.  */
> +  unsigned int vecsize_int;
> +
> +  /* Max hardware vector size in bits for floating point vectors.  */
> +  unsigned int vecsize_float;
> +
> +  /* The mangling character for a given vector size.  This is is used
> +     to determine the ISA mangling bit as specified in the Intel
> +     Vector ABI.  */
> +  unsigned char vecsize_mangle;
> +
> +  /* True if this is the masked, in-branch version of the clone,
> +     otherwise false.  */
> +  unsigned int inbranch : 1;
> +
> +  /* True if this is a Cilk Plus variant.  */
> +  unsigned int cilk_elemental : 1;
> +
> +  /* Doubly linked list of SIMD clones.  */
> +  struct cgraph_node *prev_clone, *next_clone;
> +
> +  /* Original cgraph node the SIMD clones were created for.  */
> +  struct cgraph_node *origin;
> +
> +  /* Annotated function arguments for the original function.  */
> +  struct cgraph_simd_clone_arg GTY((length ("%h.nargs"))) args[1];
> +};
> +
>  
>  /* The cgraph data structure.
>     Each function decl has assigned cgraph_node listing callees and callers.  */
> @@ -284,6 +382,12 @@ public:
>    /* Declaration node used to be clone of. */
>    tree former_clone_of;
>  
> +  /* If this is a SIMD clone, this points to the SIMD specific
> +     information for it.  */
> +  struct cgraph_simd_clone *simdclone;
> +  /* If this function has SIMD clones, this points to the first clone.  */
> +  struct cgraph_node *simd_clones;
> +

I wonder how you run all of this through LTO (I'll see below I guess ;))

>    /* Interprocedural passes scheduled to have their transform functions
>       applied next time we execute local pass on them.  We maintain it
>       per-function in order to allow IPA passes to introduce new functions.  */
> --- gcc/config/i386/i386.c	(.../trunk)	(revision 205223)
> +++ gcc/config/i386/i386.c	(.../branches/gomp-4_0-branch)	(revision 205231)
> @@ -43683,6 +43683,172 @@ ix86_memmodel_check (unsigned HOST_WIDE_
>    return val;
>  }
>  
> +/* Set CLONEI->vecsize_mangle, CLONEI->vecsize_int,
> +   CLONEI->vecsize_float and if CLONEI->simdlen is 0, also
> +   CLONEI->simdlen.  Return 0 if SIMD clones shouldn't be emitted,
> +   or number of vecsize_mangle variants that should be emitted.  */
> +
> +static int
> +ix86_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
> +					     struct cgraph_simd_clone *clonei,
> +					     tree base_type, int num)
> +{
> +  int ret = 1;
> +
> +  if (clonei->simdlen
> +      && (clonei->simdlen < 2
> +	  || clonei->simdlen > 16
> +	  || (clonei->simdlen & (clonei->simdlen - 1)) != 0))
> +    {
> +      warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
> +		  "unsupported simdlen %d\n", clonei->simdlen);
> +      return 0;
> +    }
> +
> +  tree ret_type = TREE_TYPE (TREE_TYPE (node->decl));
> +  if (TREE_CODE (ret_type) != VOID_TYPE)
> +    switch (TYPE_MODE (ret_type))
> +      {
> +      case QImode:
> +      case HImode:
> +      case SImode:
> +      case DImode:
> +      case SFmode:
> +      case DFmode:
> +      /* case SCmode: */
> +      /* case DCmode: */
> +	break;
> +      default:
> +	warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
> +		    "unsupported return type %qT for simd\n", ret_type);
> +	return 0;
> +      }
> +
> +  tree t;
> +  int i;
> +
> +  for (t = DECL_ARGUMENTS (node->decl), i = 0; t; t = DECL_CHAIN (t), i++)
> +    /* FIXME: Shouldn't we allow such arguments if they are uniform?  */
> +    switch (TYPE_MODE (TREE_TYPE (t)))
> +      {
> +      case QImode:
> +      case HImode:
> +      case SImode:
> +      case DImode:
> +      case SFmode:
> +      case DFmode:
> +      /* case SCmode: */
> +      /* case DCmode: */
> +	break;
> +      default:
> +	warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
> +		    "unsupported argument type %qT for simd\n", TREE_TYPE (t));
> +	return 0;
> +      }
> +
> +  if (clonei->cilk_elemental)
> +    {
> +      /* Parse here processor clause.  If not present, default to 'b'.  */
> +      clonei->vecsize_mangle = 'b';
> +    }
> +  else
> +    {
> +      clonei->vecsize_mangle = "bcd"[num];
> +      ret = 3;
> +    }
> +  switch (clonei->vecsize_mangle)
> +    {
> +    case 'b':
> +      clonei->vecsize_int = 128;
> +      clonei->vecsize_float = 128;
> +      break;
> +    case 'c':
> +      clonei->vecsize_int = 128;
> +      clonei->vecsize_float = 256;
> +      break;
> +    case 'd':
> +      clonei->vecsize_int = 256;
> +      clonei->vecsize_float = 256;
> +      break;
> +    }
> +  if (clonei->simdlen == 0)
> +    {
> +      if (SCALAR_INT_MODE_P (TYPE_MODE (base_type)))
> +	clonei->simdlen = clonei->vecsize_int;
> +      else
> +	clonei->simdlen = clonei->vecsize_float;
> +      clonei->simdlen /= GET_MODE_BITSIZE (TYPE_MODE (base_type));
> +      if (clonei->simdlen > 16)
> +	clonei->simdlen = 16;
> +    }
> +  return ret;
> +}
> +
> +/* Add target attribute to SIMD clone NODE if needed.  */
> +
> +static void
> +ix86_simd_clone_adjust (struct cgraph_node *node)
> +{
> +  const char *str = NULL;
> +  gcc_assert (node->decl == cfun->decl);
> +  switch (node->simdclone->vecsize_mangle)
> +    {
> +    case 'b':
> +      if (!TARGET_SSE2)
> +	str = "sse2";
> +      break;
> +    case 'c':
> +      if (!TARGET_AVX)
> +	str = "avx";
> +      break;
> +    case 'd':
> +      if (!TARGET_AVX2)
> +	str = "avx2";
> +      break;
> +    default:
> +      gcc_unreachable ();
> +    }
> +  if (str == NULL)
> +    return;
> +  push_cfun (NULL);
> +  tree args = build_tree_list (NULL_TREE, build_string (strlen (str), str));
> +  bool ok = ix86_valid_target_attribute_p (node->decl, NULL, args, 0);
> +  gcc_assert (ok);
> +  pop_cfun ();
> +  ix86_previous_fndecl = NULL_TREE;
> +  ix86_set_current_function (node->decl);
> +}
> +
> +/* If SIMD clone NODE can't be used in a vectorized loop
> +   in current function, return -1, otherwise return a badness of using it
> +   (0 if it is most desirable from vecsize_mangle point of view, 1
> +   slightly less desirable, etc.).  */
> +
> +static int
> +ix86_simd_clone_usable (struct cgraph_node *node)
> +{
> +  switch (node->simdclone->vecsize_mangle)
> +    {
> +    case 'b':
> +      if (!TARGET_SSE2)
> +	return -1;
> +      if (!TARGET_AVX)
> +	return 0;
> +      return TARGET_AVX2 ? 2 : 1;
> +    case 'c':
> +      if (!TARGET_AVX)
> +	return -1;
> +      return TARGET_AVX2 ? 1 : 0;
> +      break;
> +    case 'd':
> +      if (!TARGET_AVX2)
> +	return -1;
> +      return 0;
> +    default:
> +      gcc_unreachable ();
> +    }
> +}
> +
>  /* Implement TARGET_FLOAT_EXCEPTIONS_ROUNDING_SUPPORTED_P.  */
>  
>  static bool
> @@ -44171,6 +44337,18 @@ ix86_atomic_assign_expand_fenv (tree *ho
>  #undef TARGET_SPILL_CLASS
>  #define TARGET_SPILL_CLASS ix86_spill_class
>  
> +#undef TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN
> +#define TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN \
> +  ix86_simd_clone_compute_vecsize_and_simdlen
> +
> +#undef TARGET_SIMD_CLONE_ADJUST
> +#define TARGET_SIMD_CLONE_ADJUST \
> +  ix86_simd_clone_adjust
> +
> +#undef TARGET_SIMD_CLONE_USABLE
> +#define TARGET_SIMD_CLONE_USABLE \
> +  ix86_simd_clone_usable
> +
>  #undef TARGET_FLOAT_EXCEPTIONS_ROUNDING_SUPPORTED_P
>  #define TARGET_FLOAT_EXCEPTIONS_ROUNDING_SUPPORTED_P \
>    ix86_float_exceptions_rounding_supported_p
> --- gcc/doc/tm.texi	(.../trunk)	(revision 205223)
> +++ gcc/doc/tm.texi	(.../branches/gomp-4_0-branch)	(revision 205231)
> @@ -5818,6 +5818,26 @@ The default is @code{NULL_TREE} which me
>  loads.
>  @end deftypefn
>  
> +@deftypefn {Target Hook} int TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN (struct cgraph_node *@var{}, struct cgraph_simd_clone *@var{}, @var{tree}, @var{int})
> +This hook should set @var{vecsize_mangle}, @var{vecsize_int}, @var{vecsize_float}
> +fields in @var{simd_clone} structure pointed by @var{clone_info} argument and also
> +@var{simdlen} field if it was previously 0.
> +The hook should return 0 if SIMD clones shouldn't be emitted,
> +or number of @var{vecsize_mangle} variants that should be emitted.
> +@end deftypefn
> +
> +@deftypefn {Target Hook} void TARGET_SIMD_CLONE_ADJUST (struct cgraph_node *@var{})
> +This hook should add implicit @code{attribute(target("..."))} attribute
> +to SIMD clone @var{node} if needed.
> +@end deftypefn
> +
> +@deftypefn {Target Hook} int TARGET_SIMD_CLONE_USABLE (struct cgraph_node *@var{})
> +This hook should return -1 if SIMD clone @var{node} shouldn't be used
> +in vectorized loops in current function, or non-negative number if it is
> +usable.  In that case, the smaller the number is, the more desirable it is
> +to use it.
> +@end deftypefn
> +
>  @node Anchored Addresses
>  @section Anchored Addresses
>  @cindex anchored addresses
> --- gcc/doc/tm.texi.in	(.../trunk)	(revision 205223)
> +++ gcc/doc/tm.texi.in	(.../branches/gomp-4_0-branch)	(revision 205231)
> @@ -4422,6 +4422,12 @@ address;  but often a machine-dependent
>  
>  @hook TARGET_VECTORIZE_BUILTIN_GATHER
>  
> +@hook TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN
> +
> +@hook TARGET_SIMD_CLONE_ADJUST
> +
> +@hook TARGET_SIMD_CLONE_USABLE
> +
>  @node Anchored Addresses
>  @section Anchored Addresses
>  @cindex anchored addresses
> --- gcc/expr.c	(.../trunk)	(revision 205223)
> +++ gcc/expr.c	(.../branches/gomp-4_0-branch)	(revision 205231)
> @@ -6305,6 +6305,18 @@ store_constructor (tree exp, rtx target,
>  	    enum machine_mode mode = GET_MODE (target);
>  
>  	    icode = (int) optab_handler (vec_init_optab, mode);
> +	    /* Don't use vec_init<mode> if some elements have VECTOR_TYPE.  */
> +	    if (icode != CODE_FOR_nothing)
> +	      {
> +		tree value;
> +
> +		FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (exp), idx, value)
> +		  if (TREE_CODE (TREE_TYPE (value)) == VECTOR_TYPE)
> +		    {
> +		      icode = CODE_FOR_nothing;
> +		      break;
> +		    }
> +	      }
>  	    if (icode != CODE_FOR_nothing)
>  	      {
>  		unsigned int i;
> @@ -6382,8 +6394,8 @@ store_constructor (tree exp, rtx target,
>  
>  	    if (vector)
>  	      {
> -	        /* Vector CONSTRUCTORs should only be built from smaller
> -		   vectors in the case of BLKmode vectors.  */
> +		/* vec_init<mode> should not be used if there are VECTOR_TYPE
> +		   elements.  */
>  		gcc_assert (TREE_CODE (TREE_TYPE (value)) != VECTOR_TYPE);
>  		RTVEC_ELT (vector, eltpos)
>  		  = expand_normal (value);

The expr.c hunk is also ok independently of the patch.

> --- gcc/ggc.h	(.../trunk)	(revision 205223)
> +++ gcc/ggc.h	(.../branches/gomp-4_0-branch)	(revision 205231)
> @@ -276,4 +276,11 @@ ggc_alloc_cleared_gimple_statement_stat
>      ggc_internal_cleared_alloc_stat (s PASS_MEM_STAT);
>  }
>  
> +static inline struct simd_clone *
> +ggc_alloc_cleared_simd_clone_stat (size_t s MEM_STAT_DECL)
> +{
> +  return (struct simd_clone *)
> +    ggc_internal_cleared_alloc_stat (s PASS_MEM_STAT);
> +}
> +
>  #endif
> --- gcc/ipa.c	(.../trunk)	(revision 205223)
> +++ gcc/ipa.c	(.../branches/gomp-4_0-branch)	(revision 205231)
> @@ -247,7 +247,7 @@ walk_polymorphic_call_targets (pointer_s
>       hope calls to them will be devirtualized. 
>  
>       Again we remove them after inlining.  In late optimization some
> -     devirtualization may happen, but it is not importnat since we won't inline
> +     devirtualization may happen, but it is not important since we won't inline
>       the call. In theory early opts and IPA should work out all important cases.
>  
>     - virtual clones needs bodies of their origins for later materialization;
> @@ -275,7 +275,7 @@ walk_polymorphic_call_targets (pointer_s
>     by reachable symbols or origins of clones).  The queue is represented
>     as linked list by AUX pointer terminated by 1.
>  
> -   A the end we keep all reachable symbols. For symbols in boundary we always
> +   At the end we keep all reachable symbols. For symbols in boundary we always
>     turn definition into a declaration, but we may keep function body around
>     based on body_needed_for_clonning
>  
> @@ -427,6 +427,19 @@ symtab_remove_unreachable_nodes (bool be
>  		      enqueue_node (cnode, &first, reachable);
>  		    }
>  		}
> +
> +	    }
> +	  /* If any reachable function has simd clones, mark them as
> +	     reachable as well.  */
> +	  if (cnode->simd_clones)
> +	    {
> +	      cgraph_node *next;
> +	      for (next = cnode->simd_clones;
> +		   next;
> +		   next = next->simdclone->next_clone)
> +		if (in_boundary_p
> +		    || !pointer_set_insert (reachable, next))
> +		  enqueue_node (next, &first, reachable);
>  	    }
>  	}
>        /* When we see constructor of external variable, keep referred nodes in the
> --- gcc/ipa-cp.c	(.../trunk)	(revision 205223)
> +++ gcc/ipa-cp.c	(.../branches/gomp-4_0-branch)	(revision 205231)
> @@ -429,6 +429,15 @@ determine_versionability (struct cgraph_
>      reason = "not a tree_versionable_function";
>    else if (cgraph_function_body_availability (node) <= AVAIL_OVERWRITABLE)
>      reason = "insufficient body availability";
> +  else if (node->simd_clones != NULL)
> +    {
> +      /* Ideally we should clone the SIMD clones themselves and create
> +	 vector copies of them, so IPA-cp and SIMD clones can happily
> +	 coexist, but that may not be worth the effort.  */
> +      reason = "function has SIMD clones";
> +    }
> +  else if (node->simdclone != NULL)
> +    reason = "function is SIMD clone";
>  
>    if (reason && dump_file && !node->alias && !node->thunk.thunk_p)
>      fprintf (dump_file, "Function %s/%i is not versionable, reason: %s.\n",
> @@ -695,6 +704,8 @@ initialize_node_lattices (struct cgraph_
>        else
>  	disable = true;
>      }
> +  else if (node->simdclone)
> +    disable = true;
>  
>    if (disable || variable)
>      {
> --- gcc/ipa-prop.c	(.../trunk)	(revision 205223)
> +++ gcc/ipa-prop.c	(.../branches/gomp-4_0-branch)	(revision 205231)
> @@ -3355,8 +3355,8 @@ ipa_get_vector_of_formal_parms (tree fnd
>  /* Return a heap allocated vector containing types of formal parameters of
>     function type FNTYPE.  */
>  
> -static inline vec<tree> 
> -get_vector_of_formal_parm_types (tree fntype)
> +vec<tree>
> +ipa_get_vector_of_formal_parm_types (tree fntype)
>  {
>    vec<tree> types;
>    int count = 0;
> @@ -3378,32 +3378,22 @@ get_vector_of_formal_parm_types (tree fn
>     base_index field.  */
>  
>  void
> -ipa_modify_formal_parameters (tree fndecl, ipa_parm_adjustment_vec adjustments,
> -			      const char *synth_parm_prefix)
> +ipa_modify_formal_parameters (tree fndecl, ipa_parm_adjustment_vec adjustments)
>  {
> -  vec<tree> oparms, otypes;
> -  tree orig_type, new_type = NULL;
> -  tree old_arg_types, t, new_arg_types = NULL;
> -  tree parm, *link = &DECL_ARGUMENTS (fndecl);
> -  int i, len = adjustments.length ();
> -  tree new_reversed = NULL;
> -  bool care_for_types, last_parm_void;
> -
> -  if (!synth_parm_prefix)
> -    synth_parm_prefix = "SYNTH";
> -
> -  oparms = ipa_get_vector_of_formal_parms (fndecl);
> -  orig_type = TREE_TYPE (fndecl);
> -  old_arg_types = TYPE_ARG_TYPES (orig_type);
> +  vec<tree> oparms = ipa_get_vector_of_formal_parms (fndecl);
> +  tree orig_type = TREE_TYPE (fndecl);
> +  tree old_arg_types = TYPE_ARG_TYPES (orig_type);
>  
>    /* The following test is an ugly hack, some functions simply don't have any
>       arguments in their type.  This is probably a bug but well... */
> -  care_for_types = (old_arg_types != NULL_TREE);
> +  bool care_for_types = (old_arg_types != NULL_TREE);
> +  bool last_parm_void;
> +  vec<tree> otypes;
>    if (care_for_types)
>      {
>        last_parm_void = (TREE_VALUE (tree_last (old_arg_types))
>  			== void_type_node);
> -      otypes = get_vector_of_formal_parm_types (orig_type);
> +      otypes = ipa_get_vector_of_formal_parm_types (orig_type);
>        if (last_parm_void)
>  	gcc_assert (oparms.length () + 1 == otypes.length ());
>        else
> @@ -3415,16 +3405,23 @@ ipa_modify_formal_parameters (tree fndec
>        otypes.create (0);
>      }
>  
> -  for (i = 0; i < len; i++)
> +  int len = adjustments.length ();
> +  tree *link = &DECL_ARGUMENTS (fndecl);
> +  tree new_arg_types = NULL;
> +  for (int i = 0; i < len; i++)
>      {
>        struct ipa_parm_adjustment *adj;
>        gcc_assert (link);
>  
>        adj = &adjustments[i];
> -      parm = oparms[adj->base_index];
> +      tree parm;
> +      if (adj->op == IPA_PARM_OP_NEW)
> +	parm = NULL;
> +      else
> +	parm = oparms[adj->base_index];
>        adj->base = parm;
>  
> -      if (adj->copy_param)
> +      if (adj->op == IPA_PARM_OP_COPY)
>  	{
>  	  if (care_for_types)
>  	    new_arg_types = tree_cons (NULL_TREE, otypes[adj->base_index],
> @@ -3432,23 +3429,36 @@ ipa_modify_formal_parameters (tree fndec
>  	  *link = parm;
>  	  link = &DECL_CHAIN (parm);
>  	}
> -      else if (!adj->remove_param)
> +      else if (adj->op != IPA_PARM_OP_REMOVE)
>  	{
>  	  tree new_parm;
>  	  tree ptype;
>  
> -	  if (adj->by_ref)
> -	    ptype = build_pointer_type (adj->type);
> +	  if (adj->simdlen)
> +	    {
> +	      /* If we have a non-null simdlen but by_ref is true, we
> +		 want a vector of pointers.  Build the vector of
> +		 pointers here, not a pointer to a vector in the
> +		 adj->by_ref case below.  */
> +	      ptype = build_vector_type (adj->type, adj->simdlen);
> +	    }
> +	  else if (adj->by_ref)
> +	    {
> +	      ptype = build_pointer_type (adj->type);
> +	    }
>  	  else
> -	    ptype = adj->type;
> +	    {
> +	      gcc_checking_assert (!adj->by_ref || adj->simdlen);
> +	      ptype = adj->type;
> +	    }
>  
>  	  if (care_for_types)
>  	    new_arg_types = tree_cons (NULL_TREE, ptype, new_arg_types);
>  
>  	  new_parm = build_decl (UNKNOWN_LOCATION, PARM_DECL, NULL_TREE,
>  				 ptype);
> -	  DECL_NAME (new_parm) = create_tmp_var_name (synth_parm_prefix);
> -
> +	  const char *prefix = adj->arg_prefix ? adj->arg_prefix : "SYNTH";
> +	  DECL_NAME (new_parm) = create_tmp_var_name (prefix);
>  	  DECL_ARTIFICIAL (new_parm) = 1;
>  	  DECL_ARG_TYPE (new_parm) = ptype;
>  	  DECL_CONTEXT (new_parm) = fndecl;
> @@ -3456,17 +3466,20 @@ ipa_modify_formal_parameters (tree fndec
>  	  DECL_IGNORED_P (new_parm) = 1;
>  	  layout_decl (new_parm, 0);
>  
> -	  adj->base = parm;
> -	  adj->reduction = new_parm;
> +	  if (adj->op == IPA_PARM_OP_NEW)
> +	    adj->base = NULL;
> +	  else
> +	    adj->base = parm;
> +	  adj->new_decl = new_parm;
>  
>  	  *link = new_parm;
> -
>  	  link = &DECL_CHAIN (new_parm);
>  	}
>      }
>  
>    *link = NULL_TREE;
>  
> +  tree new_reversed = NULL;
>    if (care_for_types)
>      {
>        new_reversed = nreverse (new_arg_types);
> @@ -3484,8 +3497,9 @@ ipa_modify_formal_parameters (tree fndec
>       Exception is METHOD_TYPEs must have THIS argument.
>       When we are asked to remove it, we need to build new FUNCTION_TYPE
>       instead.  */
> +  tree new_type = NULL;
>    if (TREE_CODE (orig_type) != METHOD_TYPE
> -       || (adjustments[0].copy_param
> +       || (adjustments[0].op == IPA_PARM_OP_COPY
>  	  && adjustments[0].base_index == 0))
>      {
>        new_type = build_distinct_type_copy (orig_type);
> @@ -3509,7 +3523,7 @@ ipa_modify_formal_parameters (tree fndec
>  
>    /* This is a new type, not a copy of an old type.  Need to reassociate
>       variants.  We can handle everything except the main variant lazily.  */
> -  t = TYPE_MAIN_VARIANT (orig_type);
> +  tree t = TYPE_MAIN_VARIANT (orig_type);
>    if (orig_type != t)
>      {
>        TYPE_MAIN_VARIANT (new_type) = t;
> @@ -3558,13 +3572,13 @@ ipa_modify_call_arguments (struct cgraph
>  
>        adj = &adjustments[i];
>  
> -      if (adj->copy_param)
> +      if (adj->op == IPA_PARM_OP_COPY)
>  	{
>  	  tree arg = gimple_call_arg (stmt, adj->base_index);
>  
>  	  vargs.quick_push (arg);
>  	}
> -      else if (!adj->remove_param)
> +      else if (adj->op != IPA_PARM_OP_REMOVE)
>  	{
>  	  tree expr, base, off;
>  	  location_t loc;
> @@ -3683,7 +3697,7 @@ ipa_modify_call_arguments (struct cgraph
>  					   NULL, true, GSI_SAME_STMT);
>  	  vargs.quick_push (expr);
>  	}
> -      if (!adj->copy_param && MAY_HAVE_DEBUG_STMTS)
> +      if (adj->op != IPA_PARM_OP_COPY && MAY_HAVE_DEBUG_STMTS)
>  	{
>  	  unsigned int ix;
>  	  tree ddecl = NULL_TREE, origin = DECL_ORIGIN (adj->base), arg;
> @@ -3758,6 +3772,124 @@ ipa_modify_call_arguments (struct cgraph
>    free_dominance_info (CDI_DOMINATORS);
>  }

You've run the above through Martin IIRC, but ...

> +/* If the expression *EXPR should be replaced by a reduction of a parameter, do
> +   so.  ADJUSTMENTS is a pointer to a vector of adjustments.  CONVERT
> +   specifies whether the function should care about type incompatibility the
> +   current and new expressions.  If it is false, the function will leave
> +   incompatibility issues to the caller.  Return true iff the expression
> +   was modified. */
> +
> +bool
> +ipa_modify_expr (tree *expr, bool convert,
> +		 ipa_parm_adjustment_vec adjustments)
> +{
> +  struct ipa_parm_adjustment *cand
> +    = ipa_get_adjustment_candidate (&expr, &convert, adjustments, false);
> +  if (!cand)
> +    return false;
> +
> +  tree src;
> +  if (cand->by_ref)
> +    src = build_simple_mem_ref (cand->new_decl);

is this function mostly copied from elsewhere?  Because
using build_simple_mem_ref always smells like possible TBAA problems.

> +  else
> +    src = cand->new_decl;
> +
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +    {
> +      fprintf (dump_file, "About to replace expr ");
> +      print_generic_expr (dump_file, *expr, 0);
> +      fprintf (dump_file, " with ");
> +      print_generic_expr (dump_file, src, 0);
> +      fprintf (dump_file, "\n");
> +    }
> +
> +  if (convert && !useless_type_conversion_p (TREE_TYPE (*expr), cand->type))
> +    {
> +      tree vce = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (*expr), src);
> +      *expr = vce;

Why build1 and not fold it?  I assume from above you either have a plain 
decl (cand->new_decl) or a MEM_REF.  For both cases simply folding
the VCE into a MEM_REF works.

> +    }
> +  else
> +    *expr = src;
> +  return true;
> +}
> +
> +/* If T is an SSA_NAME, return NULL if it is not a default def or
> +   return its base variable if it is.  If IGNORE_DEFAULT_DEF is true,
> +   the base variable is always returned, regardless if it is a default
> +   def.  Return T if it is not an SSA_NAME.  */
> +
> +static tree
> +get_ssa_base_param (tree t, bool ignore_default_def)
> +{
> +  if (TREE_CODE (t) == SSA_NAME)
> +    {
> +      if (ignore_default_def || SSA_NAME_IS_DEFAULT_DEF (t))
> +	return SSA_NAME_VAR (t);
> +      else
> +	return NULL_TREE;
> +    }
> +  return t;
> +}

This function will return non-NULL for non-PARMs - is that intended?

> +/* Given an expression, return an adjustment entry specifying the
> +   transformation to be done on EXPR.  If no suitable adjustment entry
> +   was found, returns NULL.
> +
> +   If IGNORE_DEFAULT_DEF is set, consider SSA_NAMEs which are not a
> +   default def, otherwise bail on them.
> +
> +   If CONVERT is non-NULL, this function will set *CONVERT if the
> +   expression provided is a component reference.  ADJUSTMENTS is the
> +   adjustments vector.  */
> +
> +ipa_parm_adjustment *
> +ipa_get_adjustment_candidate (tree **expr, bool *convert,
> +			      ipa_parm_adjustment_vec adjustments,
> +			      bool ignore_default_def)
> +{
> +  if (TREE_CODE (**expr) == BIT_FIELD_REF
> +      || TREE_CODE (**expr) == IMAGPART_EXPR
> +      || TREE_CODE (**expr) == REALPART_EXPR)
> +    {
> +      *expr = &TREE_OPERAND (**expr, 0);
> +      if (convert)
> +	*convert = true;
> +    }
> +
> +  HOST_WIDE_INT offset, size, max_size;
> +  tree base = get_ref_base_and_extent (**expr, &offset, &size, &max_size);
> +  if (!base || size == -1 || max_size == -1)
> +    return NULL;
> +
> +  if (TREE_CODE (base) == MEM_REF)
> +    {
> +      offset += mem_ref_offset (base).low * BITS_PER_UNIT;
> +      base = TREE_OPERAND (base, 0);
> +    }
> +
> +  base = get_ssa_base_param (base, ignore_default_def);
> +  if (!base || TREE_CODE (base) != PARM_DECL)
> +    return NULL;
> +
> +  struct ipa_parm_adjustment *cand = NULL;
> +  unsigned int len = adjustments.length ();
> +  for (unsigned i = 0; i < len; i++)
> +    {
> +      struct ipa_parm_adjustment *adj = &adjustments[i];
> +
> +      if (adj->base == base
> +	  && (adj->offset == offset || adj->op == IPA_PARM_OP_REMOVE))
> +	{
> +	  cand = adj;
> +	  break;
> +	}
> +    }
> +
> +  if (!cand || cand->op == IPA_PARM_OP_COPY || cand->op == IPA_PARM_OP_REMOVE)
> +    return NULL;
> +  return cand;
> +}
> +
>  /* Return true iff BASE_INDEX is in ADJUSTMENTS more than once.  */
>  
>  static bool
> @@ -3803,10 +3935,14 @@ ipa_combine_adjustments (ipa_parm_adjust
>        struct ipa_parm_adjustment *n;
>        n = &inner[i];
>  
> -      if (n->remove_param)
> +      if (n->op == IPA_PARM_OP_REMOVE)
>  	removals++;
>        else
> -	tmp.quick_push (*n);
> +	{
> +	  /* FIXME: Handling of new arguments are not implemented yet.  */
> +	  gcc_assert (n->op != IPA_PARM_OP_NEW);
> +	  tmp.quick_push (*n);
> +	}
>      }
>  
>    adjustments.create (outlen + removals);
> @@ -3817,27 +3953,32 @@ ipa_combine_adjustments (ipa_parm_adjust
>        struct ipa_parm_adjustment *in = &tmp[out->base_index];
>  
>        memset (&r, 0, sizeof (r));
> -      gcc_assert (!in->remove_param);
> -      if (out->remove_param)
> +      gcc_assert (in->op != IPA_PARM_OP_REMOVE);
> +      if (out->op == IPA_PARM_OP_REMOVE)
>  	{
>  	  if (!index_in_adjustments_multiple_times_p (in->base_index, tmp))
>  	    {
> -	      r.remove_param = true;
> +	      r.op = IPA_PARM_OP_REMOVE;
>  	      adjustments.quick_push (r);
>  	    }
>  	  continue;
>  	}
> +      else
> +	{
> +	  /* FIXME: Handling of new arguments are not implemented yet.  */
> +	  gcc_assert (out->op != IPA_PARM_OP_NEW);
> +	}
>  
>        r.base_index = in->base_index;
>        r.type = out->type;
>  
>        /* FIXME:  Create nonlocal value too.  */
>  
> -      if (in->copy_param && out->copy_param)
> -	r.copy_param = true;
> -      else if (in->copy_param)
> +      if (in->op == IPA_PARM_OP_COPY && out->op == IPA_PARM_OP_COPY)
> +	r.op = IPA_PARM_OP_COPY;
> +      else if (in->op == IPA_PARM_OP_COPY)
>  	r.offset = out->offset;
> -      else if (out->copy_param)
> +      else if (out->op == IPA_PARM_OP_COPY)
>  	r.offset = in->offset;
>        else
>  	r.offset = in->offset + out->offset;
> @@ -3848,7 +3989,7 @@ ipa_combine_adjustments (ipa_parm_adjust
>      {
>        struct ipa_parm_adjustment *n = &inner[i];
>  
> -      if (n->remove_param)
> +      if (n->op == IPA_PARM_OP_REMOVE)
>  	adjustments.quick_push (*n);
>      }
>  
> @@ -3885,10 +4026,10 @@ ipa_dump_param_adjustments (FILE *file,
>  	  fprintf (file, ", base: ");
>  	  print_generic_expr (file, adj->base, 0);
>  	}
> -      if (adj->reduction)
> +      if (adj->new_decl)
>  	{
> -	  fprintf (file, ", reduction: ");
> -	  print_generic_expr (file, adj->reduction, 0);
> +	  fprintf (file, ", new_decl: ");
> +	  print_generic_expr (file, adj->new_decl, 0);
>  	}
>        if (adj->new_ssa_base)
>  	{
> @@ -3896,9 +4037,9 @@ ipa_dump_param_adjustments (FILE *file,
>  	  print_generic_expr (file, adj->new_ssa_base, 0);
>  	}
>  
> -      if (adj->copy_param)
> +      if (adj->op == IPA_PARM_OP_COPY)
>  	fprintf (file, ", copy_param");
> -      else if (adj->remove_param)
> +      else if (adj->op == IPA_PARM_OP_REMOVE)
>  	fprintf (file, ", remove_param");
>        else
>  	fprintf (file, ", offset %li", (long) adj->offset);
> --- gcc/ipa-prop.h	(.../trunk)	(revision 205223)
> +++ gcc/ipa-prop.h	(.../branches/gomp-4_0-branch)	(revision 205231)
> @@ -609,6 +609,27 @@ extern alloc_pool ipcp_values_pool;
>  extern alloc_pool ipcp_sources_pool;
>  extern alloc_pool ipcp_agg_lattice_pool;
>  
> +/* Operation to be performed for the parameter in ipa_parm_adjustment
> +   below.  */
> +enum ipa_parm_op {
> +  IPA_PARM_OP_NONE,
> +
> +  /* This describes a brand new parameter.
> +
> +     The field `type' should be set to the new type, `arg_prefix'
> +     should be set to the string prefix for the new DECL_NAME, and
> +     `new_decl' will ultimately hold the newly created argument.  */
> +  IPA_PARM_OP_NEW,
> +
> +  /* This new parameter is an unmodified parameter at index base_index. */
> +  IPA_PARM_OP_COPY,
> +
> +  /* This adjustment describes a parameter that is about to be removed
> +     completely.  Most users will probably need to book keep those so that they
> +     don't leave behinfd any non default def ssa names belonging to them.  */
> +  IPA_PARM_OP_REMOVE
> +};
> +
>  /* Structure to describe transformations of formal parameters and actual
>     arguments.  Each instance describes one new parameter and they are meant to
>     be stored in a vector.  Additionally, most users will probably want to store
> @@ -632,10 +653,11 @@ struct ipa_parm_adjustment
>       arguments.  */
>    tree alias_ptr_type;
>  
> -  /* The new declaration when creating/replacing a parameter.  Created by
> -     ipa_modify_formal_parameters, useful for functions modifying the body
> -     accordingly. */
> -  tree reduction;
> +  /* The new declaration when creating/replacing a parameter.  Created
> +     by ipa_modify_formal_parameters, useful for functions modifying
> +     the body accordingly.  For brand new arguments, this is the newly
> +     created argument.  */
> +  tree new_decl;
>  
>    /* New declaration of a substitute variable that we may use to replace all
>       non-default-def ssa names when a parm decl is going away.  */
> @@ -645,22 +667,23 @@ struct ipa_parm_adjustment
>       is NULL), this is going to be its nonlocalized vars value.  */
>    tree nonlocal_value;
>  
> +  /* This holds the prefix to be used for the new DECL_NAME.  */
> +  const char *arg_prefix;
> +
>    /* Offset into the original parameter (for the cases when the new parameter
>       is a component of an original one).  */
>    HOST_WIDE_INT offset;
>  
> -  /* Zero based index of the original parameter this one is based on.  (ATM
> -     there is no way to insert a new parameter out of the blue because there is
> -     no need but if it arises the code can be easily exteded to do so.)  */
> +  /* Zero based index of the original parameter this one is based on.  */
>    int base_index;
>  
> -  /* This new parameter is an unmodified parameter at index base_index. */
> -  unsigned copy_param : 1;
> -
> -  /* This adjustment describes a parameter that is about to be removed
> -     completely.  Most users will probably need to book keep those so that they
> -     don't leave behinfd any non default def ssa names belonging to them.  */
> -  unsigned remove_param : 1;
> +  /* If non-null, the parameter is a vector of `type' with this many
> +     elements.  */
> +  int simdlen;
> +
> +  /* Whether this parameter is a new parameter, a copy of an old one,
> +     or one about to be removed.  */
> +  enum ipa_parm_op op;
>  
>    /* The parameter is to be passed by reference.  */
>    unsigned by_ref : 1;
> @@ -671,8 +694,8 @@ typedef struct ipa_parm_adjustment ipa_p
>  typedef vec<ipa_parm_adjustment_t> ipa_parm_adjustment_vec;
>  
>  vec<tree> ipa_get_vector_of_formal_parms (tree fndecl);
> -void ipa_modify_formal_parameters (tree fndecl, ipa_parm_adjustment_vec,
> -				   const char *);
> +vec<tree> ipa_get_vector_of_formal_parm_types (tree fntype);
> +void ipa_modify_formal_parameters (tree fndecl, ipa_parm_adjustment_vec);
>  void ipa_modify_call_arguments (struct cgraph_edge *, gimple,
>  				ipa_parm_adjustment_vec);
>  ipa_parm_adjustment_vec ipa_combine_adjustments (ipa_parm_adjustment_vec,
> @@ -690,6 +713,10 @@ tree ipa_value_from_jfunc (struct ipa_no
>  			   struct ipa_jump_func *jfunc);
>  unsigned int ipcp_transform_function (struct cgraph_node *node);
>  void ipa_dump_param (FILE *, struct ipa_node_params *info, int i);
> +bool ipa_modify_expr (tree *, bool, ipa_parm_adjustment_vec);
> +ipa_parm_adjustment *ipa_get_adjustment_candidate (tree **, bool *,
> +						   ipa_parm_adjustment_vec,
> +						   bool);
>  
>  
>  /* From tree-sra.c:  */
> --- gcc/omp-low.c.jj	2013-11-21 09:25:07.000000000 +0100
> +++ gcc/omp-low.c	2013-11-21 22:17:19.334300797 +0100
> @@ -61,6 +61,8 @@ along with GCC; see the file COPYING3.
>  #include "omp-low.h"
>  #include "gimple-low.h"
>  #include "tree-cfgcleanup.h"
> +#include "pretty-print.h"
> +#include "ipa-prop.h"
>  #include "tree-nested.h"
>  
>  
> @@ -10573,5 +10677,1151 @@ make_pass_diagnose_omp_blocks (gcc::cont
>  {
>    return new pass_diagnose_omp_blocks (ctxt);
>  }
> +
> +/* SIMD clone supporting code.  */
> +
> +/* Allocate a fresh `simd_clone' and return it.  NARGS is the number
> +   of arguments to reserve space for.  */
> +
> +static struct cgraph_simd_clone *
> +simd_clone_struct_alloc (int nargs)
> +{
> +  struct cgraph_simd_clone *clone_info;
> +  size_t len = (sizeof (struct cgraph_simd_clone)
> +		+ nargs * sizeof (struct cgraph_simd_clone_arg));
> +  clone_info = (struct cgraph_simd_clone *)
> +	       ggc_internal_cleared_alloc_stat (len PASS_MEM_STAT);
> +  return clone_info;
> +}
> +
> +/* Make a copy of the `struct cgraph_simd_clone' in FROM to TO.  */
> +
> +static inline void
> +simd_clone_struct_copy (struct cgraph_simd_clone *to,
> +			struct cgraph_simd_clone *from)
> +{
> +  memcpy (to, from, (sizeof (struct cgraph_simd_clone)
> +		     + from->nargs * sizeof (struct cgraph_simd_clone_arg)));
> +}
> +
> +/* Return vector of parameter types of function FNDECL.  This uses
> +   TYPE_ARG_TYPES if available, otherwise falls back to types of
> +   DECL_ARGUMENTS types.  */
> +
> +vec<tree>
> +simd_clone_vector_of_formal_parm_types (tree fndecl)
> +{
> +  if (TYPE_ARG_TYPES (TREE_TYPE (fndecl)))
> +    return ipa_get_vector_of_formal_parm_types (TREE_TYPE (fndecl));
> +  vec<tree> args = ipa_get_vector_of_formal_parms (fndecl);
> +  unsigned int i;
> +  tree arg;
> +  FOR_EACH_VEC_ELT (args, i, arg)
> +    args[i] = TREE_TYPE (args[i]);
> +  return args;
> +}
> +
> +/* Given a simd function in NODE, extract the simd specific
> +   information from the OMP clauses passed in CLAUSES, and return
> +   the struct cgraph_simd_clone * if it should be cloned.  *INBRANCH_SPECIFIED
> +   is set to TRUE if the `inbranch' or `notinbranch' clause specified,
> +   otherwise set to FALSE.  */
> +
> +static struct cgraph_simd_clone *
> +simd_clone_clauses_extract (struct cgraph_node *node, tree clauses,
> +			    bool *inbranch_specified)
> +{
> +  vec<tree> args = simd_clone_vector_of_formal_parm_types (node->decl);
> +  tree t;
> +  int n;
> +  *inbranch_specified = false;
> +
> +  n = args.length ();
> +  if (n > 0 && args.last () == void_type_node)
> +    n--;
> +
> +  /* To distinguish from an OpenMP simd clone, Cilk Plus functions to
> +     be cloned have a distinctive artificial label in addition to "omp
> +     declare simd".  */
> +  bool cilk_clone
> +    = (flag_enable_cilkplus
> +       && lookup_attribute ("cilk plus elemental",
> +			    DECL_ATTRIBUTES (node->decl)));
> +
> +  /* Allocate one more than needed just in case this is an in-branch
> +     clone which will require a mask argument.  */
> +  struct cgraph_simd_clone *clone_info = simd_clone_struct_alloc (n + 1);
> +  clone_info->nargs = n;
> +  clone_info->cilk_elemental = cilk_clone;
> +
> +  if (!clauses)
> +    {
> +      args.release ();
> +      return clone_info;
> +    }
> +  clauses = TREE_VALUE (clauses);
> +  if (!clauses || TREE_CODE (clauses) != OMP_CLAUSE)
> +    return clone_info;
> +
> +  for (t = clauses; t; t = OMP_CLAUSE_CHAIN (t))
> +    {
> +      switch (OMP_CLAUSE_CODE (t))
> +	{
> +	case OMP_CLAUSE_INBRANCH:
> +	  clone_info->inbranch = 1;
> +	  *inbranch_specified = true;
> +	  break;
> +	case OMP_CLAUSE_NOTINBRANCH:
> +	  clone_info->inbranch = 0;
> +	  *inbranch_specified = true;
> +	  break;
> +	case OMP_CLAUSE_SIMDLEN:
> +	  clone_info->simdlen
> +	    = TREE_INT_CST_LOW (OMP_CLAUSE_SIMDLEN_EXPR (t));
> +	  break;
> +	case OMP_CLAUSE_LINEAR:
> +	  {
> +	    tree decl = OMP_CLAUSE_DECL (t);
> +	    tree step = OMP_CLAUSE_LINEAR_STEP (t);
> +	    int argno = TREE_INT_CST_LOW (decl);
> +	    if (OMP_CLAUSE_LINEAR_VARIABLE_STRIDE (t))
> +	      {
> +		clone_info->args[argno].arg_type
> +		  = SIMD_CLONE_ARG_TYPE_LINEAR_VARIABLE_STEP;
> +		clone_info->args[argno].linear_step = tree_to_shwi (step);
> +		gcc_assert (clone_info->args[argno].linear_step >= 0
> +			    && clone_info->args[argno].linear_step < n);
> +	      }
> +	    else
> +	      {
> +		if (POINTER_TYPE_P (args[argno]))
> +		  step = fold_convert (ssizetype, step);
> +		if (!tree_fits_shwi_p (step))
> +		  {
> +		    warning_at (OMP_CLAUSE_LOCATION (t), 0,
> +				"ignoring large linear step");
> +		    args.release ();
> +		    return NULL;
> +		  }
> +		else if (integer_zerop (step))
> +		  {
> +		    warning_at (OMP_CLAUSE_LOCATION (t), 0,
> +				"ignoring zero linear step");
> +		    args.release ();
> +		    return NULL;
> +		  }
> +		else
> +		  {
> +		    clone_info->args[argno].arg_type
> +		      = SIMD_CLONE_ARG_TYPE_LINEAR_CONSTANT_STEP;
> +		    clone_info->args[argno].linear_step = tree_to_shwi (step);
> +		  }
> +	      }
> +	    break;
> +	  }
> +	case OMP_CLAUSE_UNIFORM:
> +	  {
> +	    tree decl = OMP_CLAUSE_DECL (t);
> +	    int argno = tree_to_uhwi (decl);
> +	    clone_info->args[argno].arg_type
> +	      = SIMD_CLONE_ARG_TYPE_UNIFORM;
> +	    break;
> +	  }
> +	case OMP_CLAUSE_ALIGNED:
> +	  {
> +	    tree decl = OMP_CLAUSE_DECL (t);
> +	    int argno = tree_to_uhwi (decl);
> +	    clone_info->args[argno].alignment
> +	      = TREE_INT_CST_LOW (OMP_CLAUSE_ALIGNED_ALIGNMENT (t));
> +	    break;
> +	  }
> +	default:
> +	  break;
> +	}
> +    }
> +  args.release ();
> +  return clone_info;
> +}
> +
> +/* Given a SIMD clone in NODE, calculate the characteristic data
> +   type and return the coresponding type.  The characteristic data
> +   type is computed as described in the Intel Vector ABI.  */
> +
> +static tree
> +simd_clone_compute_base_data_type (struct cgraph_node *node,
> +				   struct cgraph_simd_clone *clone_info)
> +{
> +  tree type = integer_type_node;
> +  tree fndecl = node->decl;
> +
> +  /* a) For non-void function, the characteristic data type is the
> +        return type.  */
> +  if (TREE_CODE (TREE_TYPE (TREE_TYPE (fndecl))) != VOID_TYPE)
> +    type = TREE_TYPE (TREE_TYPE (fndecl));
> +
> +  /* b) If the function has any non-uniform, non-linear parameters,
> +        then the characteristic data type is the type of the first
> +        such parameter.  */
> +  else
> +    {
> +      vec<tree> map = simd_clone_vector_of_formal_parm_types (fndecl);
> +      for (unsigned int i = 0; i < clone_info->nargs; ++i)
> +	if (clone_info->args[i].arg_type == SIMD_CLONE_ARG_TYPE_VECTOR)
> +	  {
> +	    type = map[i];
> +	    break;
> +	  }
> +      map.release ();
> +    }
> +
> +  /* c) If the characteristic data type determined by a) or b) above
> +        is struct, union, or class type which is pass-by-value (except
> +        for the type that maps to the built-in complex data type), the
> +        characteristic data type is int.  */
> +  if (RECORD_OR_UNION_TYPE_P (type)
> +      && !aggregate_value_p (type, NULL)
> +      && TREE_CODE (type) != COMPLEX_TYPE)
> +    return integer_type_node;
> +
> +  /* d) If none of the above three classes is applicable, the
> +        characteristic data type is int.  */
> +
> +  return type;
> +
> +  /* e) For Intel Xeon Phi native and offload compilation, if the
> +        resulting characteristic data type is 8-bit or 16-bit integer
> +        data type, the characteristic data type is int.  */
> +  /* Well, we don't handle Xeon Phi yet.  */
> +}
> +
> +static tree
> +simd_clone_mangle (struct cgraph_node *node,
> +		   struct cgraph_simd_clone *clone_info)
> +{
> +  char vecsize_mangle = clone_info->vecsize_mangle;
> +  char mask = clone_info->inbranch ? 'M' : 'N';
> +  unsigned int simdlen = clone_info->simdlen;
> +  unsigned int n;
> +  pretty_printer pp;
> +
> +  gcc_assert (vecsize_mangle && simdlen);
> +
> +  pp_string (&pp, "_ZGV");
> +  pp_character (&pp, vecsize_mangle);
> +  pp_character (&pp, mask);
> +  pp_decimal_int (&pp, simdlen);
> +
> +  for (n = 0; n < clone_info->nargs; ++n)
> +    {
> +      struct cgraph_simd_clone_arg arg = clone_info->args[n];
> +
> +      if (arg.arg_type == SIMD_CLONE_ARG_TYPE_UNIFORM)
> +	pp_character (&pp, 'u');
> +      else if (arg.arg_type == SIMD_CLONE_ARG_TYPE_LINEAR_CONSTANT_STEP)
> +	{
> +	  gcc_assert (arg.linear_step != 0);
> +	  pp_character (&pp, 'l');
> +	  if (arg.linear_step > 1)
> +	    pp_unsigned_wide_integer (&pp, arg.linear_step);
> +	  else if (arg.linear_step < 0)
> +	    {
> +	      pp_character (&pp, 'n');
> +	      pp_unsigned_wide_integer (&pp, (-(unsigned HOST_WIDE_INT)
> +					      arg.linear_step));
> +	    }
> +	}
> +      else if (arg.arg_type == SIMD_CLONE_ARG_TYPE_LINEAR_VARIABLE_STEP)
> +	{
> +	  pp_character (&pp, 's');
> +	  pp_unsigned_wide_integer (&pp, arg.linear_step);
> +	}
> +      else
> +	pp_character (&pp, 'v');
> +      if (arg.alignment)
> +	{
> +	  pp_character (&pp, 'a');
> +	  pp_decimal_int (&pp, arg.alignment);
> +	}
> +    }
> +
> +  pp_underscore (&pp);
> +  pp_string (&pp,
> +	     IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (node->decl)));
> +  const char *str = pp_formatted_text (&pp);
> +
> +  /* If there already is a SIMD clone with the same mangled name, don't
> +     add another one.  This can happen e.g. for
> +     #pragma omp declare simd
> +     #pragma omp declare simd simdlen(8)
> +     int foo (int, int);
> +     if the simdlen is assumed to be 8 for the first one, etc.  */
> +  for (struct cgraph_node *clone = node->simd_clones; clone;
> +       clone = clone->simdclone->next_clone)
> +    if (strcmp (IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (clone->decl)),
> +		str) == 0)
> +      return NULL_TREE;
> +
> +  return get_identifier (str);
> +}
> +
> +/* Create a simd clone of OLD_NODE and return it.  */
> +
> +static struct cgraph_node *
> +simd_clone_create (struct cgraph_node *old_node)
> +{
> +  struct cgraph_node *new_node;
> +  if (old_node->definition)
> +    new_node = cgraph_function_versioning (old_node, vNULL, NULL, NULL, false,
> +					   NULL, NULL, "simdclone");
> +  else
> +    {
> +      tree old_decl = old_node->decl;
> +      tree new_decl = copy_node (old_node->decl);
> +      DECL_NAME (new_decl) = clone_function_name (old_decl, "simdclone");
> +      SET_DECL_ASSEMBLER_NAME (new_decl, DECL_NAME (new_decl));
> +      SET_DECL_RTL (new_decl, NULL);
> +      DECL_STATIC_CONSTRUCTOR (new_decl) = 0;
> +      DECL_STATIC_DESTRUCTOR (new_decl) = 0;
> +      new_node
> +	= cgraph_copy_node_for_versioning (old_node, new_decl, vNULL, NULL);
> +      cgraph_call_function_insertion_hooks (new_node);
> +    }
> +  if (new_node == NULL)
> +    return new_node;
> +
> +  TREE_PUBLIC (new_node->decl) = TREE_PUBLIC (old_node->decl);
> +
> +  /* The function cgraph_function_versioning () will force the new
> +     symbol local.  Undo this, and inherit external visability from
> +     the old node.  */
> +  new_node->local.local = old_node->local.local;
> +  new_node->externally_visible = old_node->externally_visible;
> +
> +  return new_node;
> +}
> +
> +/* Adjust the return type of the given function to its appropriate
> +   vector counterpart.  Returns a simd array to be used throughout the
> +   function as a return value.  */
> +
> +static tree
> +simd_clone_adjust_return_type (struct cgraph_node *node)
> +{
> +  tree fndecl = node->decl;
> +  tree orig_rettype = TREE_TYPE (TREE_TYPE (fndecl));
> +  unsigned int veclen;
> +  tree t;
> +
> +  /* Adjust the function return type.  */
> +  if (orig_rettype == void_type_node)
> +    return NULL_TREE;
> +  TREE_TYPE (fndecl) = build_distinct_type_copy (TREE_TYPE (fndecl));
> +  if (INTEGRAL_TYPE_P (TREE_TYPE (TREE_TYPE (fndecl)))
> +      || POINTER_TYPE_P (TREE_TYPE (TREE_TYPE (fndecl))))
> +    veclen = node->simdclone->vecsize_int;
> +  else
> +    veclen = node->simdclone->vecsize_float;
> +  veclen /= GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (fndecl))));
> +  if (veclen > node->simdclone->simdlen)
> +    veclen = node->simdclone->simdlen;
> +  if (veclen == node->simdclone->simdlen)
> +    TREE_TYPE (TREE_TYPE (fndecl))
> +      = build_vector_type (TREE_TYPE (TREE_TYPE (fndecl)),
> +			   node->simdclone->simdlen);
> +  else
> +    {
> +      t = build_vector_type (TREE_TYPE (TREE_TYPE (fndecl)), veclen);
> +      t = build_array_type_nelts (t, node->simdclone->simdlen / veclen);
> +      TREE_TYPE (TREE_TYPE (fndecl)) = t;
> +    }
> +  if (!node->definition)
> +    return NULL_TREE;
> +
> +  t = DECL_RESULT (fndecl);
> +  /* Adjust the DECL_RESULT.  */
> +  gcc_assert (TREE_TYPE (t) != void_type_node);
> +  TREE_TYPE (t) = TREE_TYPE (TREE_TYPE (fndecl));
> +  relayout_decl (t);
> +
> +  tree atype = build_array_type_nelts (orig_rettype,
> +				       node->simdclone->simdlen);
> +  if (veclen != node->simdclone->simdlen)
> +    return build1 (VIEW_CONVERT_EXPR, atype, t);
> +
> +  /* Set up a SIMD array to use as the return value.  */
> +  tree retval = create_tmp_var_raw (atype, "retval");
> +  gimple_add_tmp_var (retval);
> +  return retval;
> +}
> +
> +/* Each vector argument has a corresponding array to be used locally
> +   as part of the eventual loop.  Create such temporary array and
> +   return it.
> +
> +   PREFIX is the prefix to be used for the temporary.
> +
> +   TYPE is the inner element type.
> +
> +   SIMDLEN is the number of elements.  */
> +
> +static tree
> +create_tmp_simd_array (const char *prefix, tree type, int simdlen)
> +{
> +  tree atype = build_array_type_nelts (type, simdlen);
> +  tree avar = create_tmp_var_raw (atype, prefix);
> +  gimple_add_tmp_var (avar);
> +  return avar;
> +}
> +
> +/* Modify the function argument types to their corresponding vector
> +   counterparts if appropriate.  Also, create one array for each simd
> +   argument to be used locally when using the function arguments as
> +   part of the loop.
> +
> +   NODE is the function whose arguments are to be adjusted.
> +
> +   Returns an adjustment vector that will be filled describing how the
> +   argument types will be adjusted.  */
> +
> +static ipa_parm_adjustment_vec
> +simd_clone_adjust_argument_types (struct cgraph_node *node)
> +{
> +  vec<tree> args;
> +  ipa_parm_adjustment_vec adjustments;
> +
> +  if (node->definition)
> +    args = ipa_get_vector_of_formal_parms (node->decl);
> +  else
> +    args = simd_clone_vector_of_formal_parm_types (node->decl);
> +  adjustments.create (args.length ());
> +  unsigned i, j, veclen;
> +  struct ipa_parm_adjustment adj;
> +  for (i = 0; i < node->simdclone->nargs; ++i)
> +    {
> +      memset (&adj, 0, sizeof (adj));
> +      tree parm = args[i];
> +      tree parm_type = node->definition ? TREE_TYPE (parm) : parm;
> +      adj.base_index = i;
> +      adj.base = parm;
> +
> +      node->simdclone->args[i].orig_arg = node->definition ? parm : NULL_TREE;
> +      node->simdclone->args[i].orig_type = parm_type;
> +
> +      if (node->simdclone->args[i].arg_type != SIMD_CLONE_ARG_TYPE_VECTOR)
> +	{
> +	  /* No adjustment necessary for scalar arguments.  */
> +	  adj.op = IPA_PARM_OP_COPY;
> +	}
> +      else
> +	{
> +	  if (INTEGRAL_TYPE_P (parm_type) || POINTER_TYPE_P (parm_type))
> +	    veclen = node->simdclone->vecsize_int;
> +	  else
> +	    veclen = node->simdclone->vecsize_float;
> +	  veclen /= GET_MODE_BITSIZE (TYPE_MODE (parm_type));
> +	  if (veclen > node->simdclone->simdlen)
> +	    veclen = node->simdclone->simdlen;
> +	  adj.simdlen = veclen;
> +	  adj.arg_prefix = "simd";
> +	  if (POINTER_TYPE_P (parm_type))
> +	    adj.by_ref = 1;
> +	  adj.type = parm_type;
> +	  node->simdclone->args[i].vector_type
> +	    = build_vector_type (parm_type, veclen);
> +	  for (j = veclen; j < node->simdclone->simdlen; j += veclen)
> +	    {
> +	      adjustments.safe_push (adj);
> +	      if (j == veclen)
> +		{
> +		  memset (&adj, 0, sizeof (adj));
> +		  adj.op = IPA_PARM_OP_NEW;
> +		  adj.arg_prefix = "simd";
> +		  adj.base_index = i;
> +		  adj.type = node->simdclone->args[i].vector_type;
> +		}
> +	    }
> +
> +	  if (node->definition)
> +	    node->simdclone->args[i].simd_array
> +	      = create_tmp_simd_array (IDENTIFIER_POINTER (DECL_NAME (parm)),
> +				       parm_type, node->simdclone->simdlen);
> +	}
> +      adjustments.safe_push (adj);
> +    }
> +
> +  if (node->simdclone->inbranch)
> +    {
> +      tree base_type
> +	= simd_clone_compute_base_data_type (node->simdclone->origin,
> +					     node->simdclone);
> +
> +      memset (&adj, 0, sizeof (adj));
> +      adj.op = IPA_PARM_OP_NEW;
> +      adj.arg_prefix = "mask";
> +
> +      adj.base_index = i;
> +      if (INTEGRAL_TYPE_P (base_type) || POINTER_TYPE_P (base_type))
> +	veclen = node->simdclone->vecsize_int;
> +      else
> +	veclen = node->simdclone->vecsize_float;
> +      veclen /= GET_MODE_BITSIZE (TYPE_MODE (base_type));
> +      if (veclen > node->simdclone->simdlen)
> +	veclen = node->simdclone->simdlen;
> +      adj.type = build_vector_type (base_type, veclen);
> +      adjustments.safe_push (adj);
> +
> +      for (j = veclen; j < node->simdclone->simdlen; j += veclen)
> +	adjustments.safe_push (adj);
> +
> +      /* We have previously allocated one extra entry for the mask.  Use
> +	 it and fill it.  */
> +      struct cgraph_simd_clone *sc = node->simdclone;
> +      sc->nargs++;
> +      if (node->definition)
> +	{
> +	  sc->args[i].orig_arg
> +	    = build_decl (UNKNOWN_LOCATION, PARM_DECL, NULL, base_type);
> +	  sc->args[i].simd_array
> +	    = create_tmp_simd_array ("mask", base_type, sc->simdlen);
> +	}
> +      sc->args[i].orig_type = base_type;
> +      sc->args[i].arg_type = SIMD_CLONE_ARG_TYPE_MASK;
> +    }
> +
> +  if (node->definition)
> +    ipa_modify_formal_parameters (node->decl, adjustments);
> +  else
> +    {
> +      tree new_arg_types = NULL_TREE, new_reversed;
> +      bool last_parm_void = false;
> +      if (args.length () > 0 && args.last () == void_type_node)
> +	last_parm_void = true;
> +
> +      gcc_assert (TYPE_ARG_TYPES (TREE_TYPE (node->decl)));
> +      j = adjustments.length ();
> +      for (i = 0; i < j; i++)
> +	{
> +	  struct ipa_parm_adjustment *adj = &adjustments[i];
> +	  tree ptype;
> +	  if (adj->op == IPA_PARM_OP_COPY)
> +	    ptype = args[adj->base_index];
> +	  else if (adj->simdlen)
> +	    ptype = build_vector_type (adj->type, adj->simdlen);
> +	  else
> +	    ptype = adj->type;
> +	  new_arg_types = tree_cons (NULL_TREE, ptype, new_arg_types);
> +	}
> +      new_reversed = nreverse (new_arg_types);
> +      if (last_parm_void)
> +	{
> +	  if (new_reversed)
> +	    TREE_CHAIN (new_arg_types) = void_list_node;
> +	  else
> +	    new_reversed = void_list_node;
> +	}
> +
> +      tree new_type = build_distinct_type_copy (TREE_TYPE (node->decl));
> +      TYPE_ARG_TYPES (new_type) = new_reversed;
> +      TREE_TYPE (node->decl) = new_type;
> +
> +      adjustments.release ();
> +    }
> +  args.release ();
> +  return adjustments;
> +}
> +
> +/* Initialize and copy the function arguments in NODE to their
> +   corresponding local simd arrays.  Returns a fresh gimple_seq with
> +   the instruction sequence generated.  */
> +
> +static gimple_seq
> +simd_clone_init_simd_arrays (struct cgraph_node *node,
> +			     ipa_parm_adjustment_vec adjustments)
> +{
> +  gimple_seq seq = NULL;
> +  unsigned i = 0, j = 0, k;
> +
> +  for (tree arg = DECL_ARGUMENTS (node->decl);
> +       arg;
> +       arg = DECL_CHAIN (arg), i++, j++)
> +    {
> +      if (adjustments[j].op == IPA_PARM_OP_COPY)
> +	continue;
> +
> +      node->simdclone->args[i].vector_arg = arg;
> +
> +      tree array = node->simdclone->args[i].simd_array;
> +      if ((unsigned) adjustments[j].simdlen == node->simdclone->simdlen)
> +	{
> +	  tree ptype = build_pointer_type (TREE_TYPE (TREE_TYPE (array)));
> +	  tree ptr = build_fold_addr_expr (array);
> +	  tree t = build2 (MEM_REF, TREE_TYPE (arg), ptr,
> +			   build_int_cst (ptype, 0));
> +	  t = build2 (MODIFY_EXPR, TREE_TYPE (t), t, arg);
> +	  gimplify_and_add (t, &seq);
> +	}
> +      else
> +	{
> +	  unsigned int simdlen = adjustments[j].simdlen;
> +	  if (node->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_MASK)
> +	    simdlen = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg));
> +	  tree ptype = build_pointer_type (TREE_TYPE (TREE_TYPE (array)));
> +	  for (k = 0; k < node->simdclone->simdlen; k += simdlen)
> +	    {
> +	      tree ptr = build_fold_addr_expr (array);
> +	      int elemsize;
> +	      if (k)
> +		{
> +		  arg = DECL_CHAIN (arg);
> +		  j++;
> +		}
> +	      elemsize
> +		= GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (arg))));
> +	      tree t = build2 (MEM_REF, TREE_TYPE (arg), ptr,
> +			       build_int_cst (ptype, k * elemsize));
> +	      t = build2 (MODIFY_EXPR, TREE_TYPE (t), t, arg);
> +	      gimplify_and_add (t, &seq);
> +	    }
> +	}
> +    }
> +  return seq;
> +}
> +
> +/* Callback info for ipa_simd_modify_stmt_ops below.  */
> +
> +struct modify_stmt_info {
> +  ipa_parm_adjustment_vec adjustments;
> +  gimple stmt;
> +  /* True if the parent statement was modified by
> +     ipa_simd_modify_stmt_ops.  */
> +  bool modified;
> +};
> +
> +/* Callback for walk_gimple_op.
> +
> +   Adjust operands from a given statement as specified in the
> +   adjustments vector in the callback data.  */
> +
> +static tree
> +ipa_simd_modify_stmt_ops (tree *tp, int *walk_subtrees, void *data)
> +{
> +  struct walk_stmt_info *wi = (struct walk_stmt_info *) data;
> +  if (!SSA_VAR_P (*tp))
> +    {
> +      /* Make sure we treat subtrees as a RHS.  This makes sure that
> +	 when examining the `*foo' in *foo=x, the `foo' get treated as
> +	 a use properly.  */
> +      wi->is_lhs = false;
> +      wi->val_only = true;
> +      if (TYPE_P (*tp))
> +	*walk_subtrees = 0;
> +      return NULL_TREE;
> +    }
> +  struct modify_stmt_info *info = (struct modify_stmt_info *) wi->info;
> +  struct ipa_parm_adjustment *cand
> +    = ipa_get_adjustment_candidate (&tp, NULL, info->adjustments, true);
> +  if (!cand)
> +    return NULL_TREE;
> +
> +  tree t = *tp;
> +  tree repl = make_ssa_name (TREE_TYPE (t), NULL);
> +
> +  gimple stmt;
> +  gimple_stmt_iterator gsi = gsi_for_stmt (info->stmt);
> +  if (wi->is_lhs)
> +    {
> +      stmt = gimple_build_assign (unshare_expr (cand->new_decl), repl);
> +      gsi_insert_after (&gsi, stmt, GSI_SAME_STMT);
> +      SSA_NAME_DEF_STMT (repl) = info->stmt;
> +    }
> +  else
> +    {
> +      /* You'd think we could skip the extra SSA variable when
> +	 wi->val_only=true, but we may have `*var' which will get
> +	 replaced into `*var_array[iter]' and will likely be something
> +	 not gimple.  */
> +      stmt = gimple_build_assign (repl, unshare_expr (cand->new_decl));
> +      gsi_insert_before (&gsi, stmt, GSI_SAME_STMT);
> +    }
> +
> +  if (!useless_type_conversion_p (TREE_TYPE (*tp), TREE_TYPE (repl)))
> +    {
> +      tree vce = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (*tp), repl);
> +      *tp = vce;
> +    }
> +  else
> +    *tp = repl;
> +
> +  info->modified = true;
> +  wi->is_lhs = false;
> +  wi->val_only = true;
> +  return NULL_TREE;
> +}
> +
> +/* Traverse the function body and perform all modifications as
> +   described in ADJUSTMENTS.  At function return, ADJUSTMENTS will be
> +   modified such that the replacement/reduction value will now be an
> +   offset into the corresponding simd_array.
> +
> +   This function will replace all function argument uses with their
> +   corresponding simd array elements, and ajust the return values
> +   accordingly.  */
> +
> +static void
> +ipa_simd_modify_function_body (struct cgraph_node *node,
> +			       ipa_parm_adjustment_vec adjustments,
> +			       tree retval_array, tree iter)
> +{
> +  basic_block bb;
> +  unsigned int i, j;
> +
> +  /* Re-use the adjustments array, but this time use it to replace
> +     every function argument use to an offset into the corresponding
> +     simd_array.  */
> +  for (i = 0, j = 0; i < node->simdclone->nargs; ++i, ++j)
> +    {
> +      if (!node->simdclone->args[i].vector_arg)
> +	continue;
> +
> +      tree basetype = TREE_TYPE (node->simdclone->args[i].orig_arg);
> +      adjustments[j].new_decl
> +	= build4 (ARRAY_REF,
> +		  basetype,
> +		  node->simdclone->args[i].simd_array,
> +		  iter,
> +		  NULL_TREE, NULL_TREE);
> +      if (adjustments[j].op == IPA_PARM_OP_NONE
> +	  && (unsigned) adjustments[j].simdlen < node->simdclone->simdlen)
> +	j += node->simdclone->simdlen / adjustments[j].simdlen - 1;
> +    }
> +
> +  struct modify_stmt_info info;
> +  info.adjustments = adjustments;
> +
> +  FOR_EACH_BB_FN (bb, DECL_STRUCT_FUNCTION (node->decl))
> +    {
> +      gimple_stmt_iterator gsi;
> +
> +      gsi = gsi_start_bb (bb);
> +      while (!gsi_end_p (gsi))
> +	{
> +	  gimple stmt = gsi_stmt (gsi);
> +	  info.stmt = stmt;
> +	  struct walk_stmt_info wi;
> +
> +	  memset (&wi, 0, sizeof (wi));
> +	  info.modified = false;
> +	  wi.info = &info;
> +	  walk_gimple_op (stmt, ipa_simd_modify_stmt_ops, &wi);
> +
> +	  if (gimple_code (stmt) == GIMPLE_RETURN)
> +	    {
> +	      tree retval = gimple_return_retval (stmt);
> +	      if (!retval)
> +		{
> +		  gsi_remove (&gsi, true);
> +		  continue;
> +		}
> +
> +	      /* Replace `return foo' with `retval_array[iter] = foo'.  */
> +	      tree ref = build4 (ARRAY_REF, TREE_TYPE (retval),
> +				 retval_array, iter, NULL, NULL);
> +	      stmt = gimple_build_assign (ref, retval);
> +	      gsi_replace (&gsi, stmt, true);
> +	      info.modified = true;
> +	    }
> +
> +	  if (info.modified)
> +	    {
> +	      update_stmt (stmt);
> +	      if (maybe_clean_eh_stmt (stmt))
> +		gimple_purge_dead_eh_edges (gimple_bb (stmt));
> +	    }
> +	  gsi_next (&gsi);
> +	}
> +    }
> +}
> +
> +/* Adjust the argument types in NODE to their appropriate vector
> +   counterparts.  */
> +
> +static void
> +simd_clone_adjust (struct cgraph_node *node)
> +{
> +  push_cfun (DECL_STRUCT_FUNCTION (node->decl));
> +
> +  targetm.simd_clone.adjust (node);
> +
> +  tree retval = simd_clone_adjust_return_type (node);
> +  ipa_parm_adjustment_vec adjustments
> +    = simd_clone_adjust_argument_types (node);
> +
> +  push_gimplify_context ();
> +
> +  gimple_seq seq = simd_clone_init_simd_arrays (node, adjustments);
> +
> +  /* Adjust all uses of vector arguments accordingly.  Adjust all
> +     return values accordingly.  */
> +  tree iter = create_tmp_var (unsigned_type_node, "iter");
> +  tree iter1 = make_ssa_name (iter, NULL);
> +  tree iter2 = make_ssa_name (iter, NULL);
> +  ipa_simd_modify_function_body (node, adjustments, retval, iter1);
> +
> +  /* Initialize the iteration variable.  */
> +  basic_block entry_bb = single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun));
> +  basic_block body_bb = split_block_after_labels (entry_bb)->dest;
> +  gimple_stmt_iterator gsi = gsi_after_labels (entry_bb);
> +  /* Insert the SIMD array and iv initialization at function
> +     entry.  */
> +  gsi_insert_seq_before (&gsi, seq, GSI_NEW_STMT);
> +
> +  pop_gimplify_context (NULL);
> +
> +  /* Create a new BB right before the original exit BB, to hold the
> +     iteration increment and the condition/branch.  */
> +  basic_block orig_exit = EDGE_PRED (EXIT_BLOCK_PTR_FOR_FN (cfun), 0)->src;
> +  basic_block incr_bb = create_empty_bb (orig_exit);
> +  /* The succ of orig_exit was EXIT_BLOCK_PTR_FOR_FN (cfun), with an empty
> +     flag.  Set it now to be a FALLTHRU_EDGE.  */
> +  gcc_assert (EDGE_COUNT (orig_exit->succs) == 1);
> +  EDGE_SUCC (orig_exit, 0)->flags |= EDGE_FALLTHRU;
> +  for (unsigned i = 0;
> +       i < EDGE_COUNT (EXIT_BLOCK_PTR_FOR_FN (cfun)->preds); ++i)
> +    {
> +      edge e = EDGE_PRED (EXIT_BLOCK_PTR_FOR_FN (cfun), i);
> +      redirect_edge_succ (e, incr_bb);
> +    }
> +  edge e = make_edge (incr_bb, EXIT_BLOCK_PTR_FOR_FN (cfun), 0);
> +  e->probability = REG_BR_PROB_BASE;
> +  gsi = gsi_last_bb (incr_bb);
> +  gimple g = gimple_build_assign_with_ops (PLUS_EXPR, iter2, iter1,
> +					   build_int_cst (unsigned_type_node,
> +							  1));
> +  gsi_insert_after (&gsi, g, GSI_CONTINUE_LINKING);
> +
> +  /* Mostly annotate the loop for the vectorizer (the rest is done below).  */
> +  struct loop *loop = alloc_loop ();
> +  cfun->has_force_vect_loops = true;
> +  loop->safelen = node->simdclone->simdlen;
> +  loop->force_vect = true;
> +  loop->header = body_bb;
> +  add_bb_to_loop (incr_bb, loop);
> +
> +  /* Branch around the body if the mask applies.  */
> +  if (node->simdclone->inbranch)
> +    {
> +      gimple_stmt_iterator gsi = gsi_last_bb (loop->header);
> +      tree mask_array
> +	= node->simdclone->args[node->simdclone->nargs - 1].simd_array;
> +      tree mask = make_ssa_name (TREE_TYPE (TREE_TYPE (mask_array)), NULL);
> +      tree aref = build4 (ARRAY_REF,
> +			  TREE_TYPE (TREE_TYPE (mask_array)),
> +			  mask_array, iter1,
> +			  NULL, NULL);
> +      g = gimple_build_assign (mask, aref);
> +      gsi_insert_after (&gsi, g, GSI_CONTINUE_LINKING);
> +      int bitsize = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (aref)));
> +      if (!INTEGRAL_TYPE_P (TREE_TYPE (aref)))
> +	{
> +	  aref = build1 (VIEW_CONVERT_EXPR,
> +			 build_nonstandard_integer_type (bitsize, 0), mask);
> +	  mask = make_ssa_name (TREE_TYPE (aref), NULL);
> +	  g = gimple_build_assign (mask, aref);
> +	  gsi_insert_after (&gsi, g, GSI_CONTINUE_LINKING);
> +	}
> +
> +      g = gimple_build_cond (EQ_EXPR, mask, build_zero_cst (TREE_TYPE (mask)),
> +			     NULL, NULL);
> +      gsi_insert_after (&gsi, g, GSI_CONTINUE_LINKING);
> +      make_edge (loop->header, incr_bb, EDGE_TRUE_VALUE);
> +      FALLTHRU_EDGE (loop->header)->flags = EDGE_FALSE_VALUE;
> +    }
> +
> +  /* Generate the condition.  */
> +  g = gimple_build_cond (LT_EXPR,
> +			 iter2,
> +			 build_int_cst (unsigned_type_node,
> +					node->simdclone->simdlen),
> +			 NULL, NULL);
> +  gsi_insert_after (&gsi, g, GSI_CONTINUE_LINKING);
> +  e = split_block (incr_bb, gsi_stmt (gsi));
> +  basic_block latch_bb = e->dest;
> +  basic_block new_exit_bb = e->dest;
> +  new_exit_bb = split_block (latch_bb, NULL)->dest;
> +  loop->latch = latch_bb;
> +
> +  redirect_edge_succ (FALLTHRU_EDGE (latch_bb), body_bb);
> +
> +  make_edge (incr_bb, new_exit_bb, EDGE_FALSE_VALUE);
> +  /* The successor of incr_bb is already pointing to latch_bb; just
> +     change the flags.
> +     make_edge (incr_bb, latch_bb, EDGE_TRUE_VALUE);  */
> +  FALLTHRU_EDGE (incr_bb)->flags = EDGE_TRUE_VALUE;
> +
> +  gimple phi = create_phi_node (iter1, body_bb);
> +  edge preheader_edge = find_edge (entry_bb, body_bb);
> +  edge latch_edge = single_succ_edge (latch_bb);
> +  add_phi_arg (phi, build_zero_cst (unsigned_type_node), preheader_edge,
> +	       UNKNOWN_LOCATION);
> +  add_phi_arg (phi, iter2, latch_edge, UNKNOWN_LOCATION);
> +
> +  /* Generate the new return.  */
> +  gsi = gsi_last_bb (new_exit_bb);
> +  if (retval
> +      && TREE_CODE (retval) == VIEW_CONVERT_EXPR
> +      && TREE_CODE (TREE_OPERAND (retval, 0)) == RESULT_DECL)
> +    retval = TREE_OPERAND (retval, 0);
> +  else if (retval)
> +    {
> +      retval = build1 (VIEW_CONVERT_EXPR,
> +		       TREE_TYPE (TREE_TYPE (node->decl)),
> +		       retval);
> +      retval = force_gimple_operand_gsi (&gsi, retval, true, NULL,
> +					 false, GSI_CONTINUE_LINKING);
> +    }
> +  g = gimple_build_return (retval);
> +  gsi_insert_after (&gsi, g, GSI_CONTINUE_LINKING);
> +
> +  /* Handle aligned clauses by replacing default defs of the aligned
> +     uniform args with __builtin_assume_aligned (arg_N(D), alignment)
> +     lhs.  Handle linear by adding PHIs.  */
> +  for (unsigned i = 0; i < node->simdclone->nargs; i++)
> +    if (node->simdclone->args[i].alignment
> +	&& node->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_UNIFORM
> +	&& (node->simdclone->args[i].alignment
> +	    & (node->simdclone->args[i].alignment - 1)) == 0
> +	&& TREE_CODE (TREE_TYPE (node->simdclone->args[i].orig_arg))
> +	   == POINTER_TYPE)
> +      {
> +	unsigned int alignment = node->simdclone->args[i].alignment;
> +	tree orig_arg = node->simdclone->args[i].orig_arg;
> +	tree def = ssa_default_def (cfun, orig_arg);
> +	if (!has_zero_uses (def))
> +	  {
> +	    tree fn = builtin_decl_explicit (BUILT_IN_ASSUME_ALIGNED);
> +	    gimple_seq seq = NULL;
> +	    bool need_cvt = false;
> +	    gimple call
> +	      = gimple_build_call (fn, 2, def, size_int (alignment));
> +	    g = call;
> +	    if (!useless_type_conversion_p (TREE_TYPE (orig_arg),
> +					    ptr_type_node))
> +	      need_cvt = true;
> +	    tree t = make_ssa_name (need_cvt ? ptr_type_node : orig_arg, NULL);
> +	    gimple_call_set_lhs (g, t);
> +	    gimple_seq_add_stmt_without_update (&seq, g);
> +	    if (need_cvt)
> +	      {
> +		t = make_ssa_name (orig_arg, NULL);
> +		g = gimple_build_assign_with_ops (NOP_EXPR, t,
> +						  gimple_call_lhs (g),
> +						  NULL_TREE);
> +		gimple_seq_add_stmt_without_update (&seq, g);
> +	      }
> +	    gsi_insert_seq_on_edge_immediate
> +	      (single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun)), seq);
> +
> +	    entry_bb = single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun));
> +	    int freq = compute_call_stmt_bb_frequency (current_function_decl,
> +						       entry_bb);
> +	    cgraph_create_edge (node, cgraph_get_create_node (fn),
> +				call, entry_bb->count, freq);
> +
> +	    imm_use_iterator iter;
> +	    use_operand_p use_p;
> +	    gimple use_stmt;
> +	    tree repl = gimple_get_lhs (g);
> +	    FOR_EACH_IMM_USE_STMT (use_stmt, iter, def)
> +	      if (is_gimple_debug (use_stmt) || use_stmt == call)
> +		continue;
> +	      else
> +		FOR_EACH_IMM_USE_ON_STMT (use_p, iter)
> +		  SET_USE (use_p, repl);
> +	  }
> +      }
> +    else if (node->simdclone->args[i].arg_type
> +	     == SIMD_CLONE_ARG_TYPE_LINEAR_CONSTANT_STEP)
> +      {
> +	tree orig_arg = node->simdclone->args[i].orig_arg;
> +	tree def = ssa_default_def (cfun, orig_arg);
> +	gcc_assert (INTEGRAL_TYPE_P (TREE_TYPE (orig_arg))
> +		    || POINTER_TYPE_P (TREE_TYPE (orig_arg)));
> +	if (!has_zero_uses (def))
> +	  {
> +	    iter1 = make_ssa_name (orig_arg, NULL);
> +	    iter2 = make_ssa_name (orig_arg, NULL);
> +	    phi = create_phi_node (iter1, body_bb);
> +	    add_phi_arg (phi, def, preheader_edge, UNKNOWN_LOCATION);
> +	    add_phi_arg (phi, iter2, latch_edge, UNKNOWN_LOCATION);
> +	    enum tree_code code = INTEGRAL_TYPE_P (TREE_TYPE (orig_arg))
> +				  ? PLUS_EXPR : POINTER_PLUS_EXPR;
> +	    tree addtype = INTEGRAL_TYPE_P (TREE_TYPE (orig_arg))
> +			   ? TREE_TYPE (orig_arg) : sizetype;
> +	    tree addcst
> +	      = build_int_cst (addtype, node->simdclone->args[i].linear_step);
> +	    g = gimple_build_assign_with_ops (code, iter2, iter1, addcst);
> +	    gsi = gsi_last_bb (incr_bb);
> +	    gsi_insert_before (&gsi, g, GSI_SAME_STMT);
> +
> +	    imm_use_iterator iter;
> +	    use_operand_p use_p;
> +	    gimple use_stmt;
> +	    FOR_EACH_IMM_USE_STMT (use_stmt, iter, def)
> +	      if (use_stmt == phi)
> +		continue;
> +	      else
> +		FOR_EACH_IMM_USE_ON_STMT (use_p, iter)
> +		  SET_USE (use_p, iter1);
> +	  }
> +      }
> +
> +  calculate_dominance_info (CDI_DOMINATORS);
> +  add_loop (loop, loop->header->loop_father);
> +  update_ssa (TODO_update_ssa);
> +
> +  pop_cfun ();
> +}
> +
> +/* If the function in NODE is tagged as an elemental SIMD function,
> +   create the appropriate SIMD clones.  */
> +
> +static void
> +expand_simd_clones (struct cgraph_node *node)
> +{
> +  if (lookup_attribute ("noclone", DECL_ATTRIBUTES (node->decl)))
> +    return;
> +
> +  tree attr = lookup_attribute ("omp declare simd",
> +				DECL_ATTRIBUTES (node->decl));
> +  if (!attr || targetm.simd_clone.compute_vecsize_and_simdlen == NULL)
> +    return;
> +  /* Ignore
> +     #pragma omp declare simd
> +     extern int foo ();
> +     in C, there we don't know the argument types at all.  */
> +  if (!node->definition
> +      && TYPE_ARG_TYPES (TREE_TYPE (node->decl)) == NULL_TREE)
> +    return;

I wonder if you want to diagnose this case (but where?  best during
parsing if that is allowed).

> +  do
> +    {
> +      bool inbranch_clause_specified;
> +      struct cgraph_simd_clone *clone_info
> +	= simd_clone_clauses_extract (node, TREE_VALUE (attr),
> +				      &inbranch_clause_specified);
> +      if (clone_info == NULL)
> +	continue;
> +
> +      int orig_simdlen = clone_info->simdlen;
> +      tree base_type = simd_clone_compute_base_data_type (node, clone_info);
> +      int count
> +	= targetm.simd_clone.compute_vecsize_and_simdlen (node, clone_info,
> +							  base_type, 0);
> +      if (count == 0)
> +	continue;
> +
> +      for (int i = 0; i < count * 2; i++)

Here (and also elsewhere) the patch could do with a few extra
comments what is happening.

> +	{
> +	  struct cgraph_simd_clone *clone = clone_info;
> +	  if (inbranch_clause_specified && (i & 1) != 0)
> +	    continue;
> +
> +	  if (i != 0)
> +	    {
> +	      clone = simd_clone_struct_alloc (clone_info->nargs
> +					       - clone_info->inbranch
> +					       + ((i & 1) != 0));
> +	      simd_clone_struct_copy (clone, clone_info);
> +	      clone->nargs -= clone_info->inbranch;
> +	      clone->simdlen = orig_simdlen;
> +	      targetm.simd_clone.compute_vecsize_and_simdlen (node, clone,
> +							      base_type,
> +							      i / 2);
> +	      if ((i & 1) != 0)
> +		clone->inbranch = 1;
> +	    }
> +
> +	  tree id = simd_clone_mangle (node, clone);
> +	  if (id == NULL_TREE)
> +	    continue;
> +
> +	  struct cgraph_node *n = simd_clone_create (node);
> +	  if (n == NULL)
> +	    continue;
> +
> +	  n->simdclone = clone;
> +	  clone->origin = node;
> +	  clone->next_clone = NULL;
> +	  if (node->simd_clones == NULL)
> +	    {
> +	      clone->prev_clone = n;
> +	      node->simd_clones = n;
> +	    }
> +	  else
> +	    {
> +	      clone->prev_clone = node->simd_clones->simdclone->prev_clone;
> +	      clone->prev_clone->simdclone->next_clone = n;
> +	      node->simd_clones->simdclone->prev_clone = n;
> +	    }
> +	  change_decl_assembler_name (n->decl, id);
> +	  if (node->definition)
> +	    simd_clone_adjust (n);
> +	  else
> +	    {
> +	      simd_clone_adjust_return_type (n);
> +	      simd_clone_adjust_argument_types (n);
> +	    }
> +	}
> +    }
> +  while ((attr = lookup_attribute ("omp declare simd", TREE_CHAIN (attr))));
> +}
> +
> +/* Entry point for IPA simd clone creation pass.  */
> +
> +static unsigned int
> +ipa_omp_simd_clone (void)
> +{
> +  struct cgraph_node *node;
> +  FOR_EACH_FUNCTION (node)
> +    expand_simd_clones (node);
> +  return 0;
> +}
> +
> +namespace {
> +
> +const pass_data pass_data_omp_simd_clone =
> +{
> +  SIMPLE_IPA_PASS,		/* type */
> +  "simdclone",			/* name */
> +  OPTGROUP_NONE,		/* optinfo_flags */
> +  true,				/* has_gate */
> +  true,				/* has_execute */
> +  TV_NONE,			/* tv_id */
> +  ( PROP_ssa | PROP_cfg ),	/* properties_required */
> +  0,				/* properties_provided */
> +  0,				/* properties_destroyed */
> +  0,				/* todo_flags_start */
> +  0,				/* todo_flags_finish */
> +};
> +
> +class pass_omp_simd_clone : public simple_ipa_opt_pass
> +{
> +public:
> +  pass_omp_simd_clone(gcc::context *ctxt)
> +    : simple_ipa_opt_pass(pass_data_omp_simd_clone, ctxt)
> +  {}
> +
> +  /* opt_pass methods: */
> +  bool gate () { return flag_openmp || flag_openmp_simd
> +			|| flag_enable_cilkplus; }
> +  unsigned int execute () { return ipa_omp_simd_clone (); }
> +};
> +
> +} // anon namespace
> +
> +simple_ipa_opt_pass *
> +make_pass_omp_simd_clone (gcc::context *ctxt)
> +{
> +  return new pass_omp_simd_clone (ctxt);
> +}
>  
>  #include "gt-omp-low.h"
> --- gcc/passes.def	(.../trunk)	(revision 205223)
> +++ gcc/passes.def	(.../branches/gomp-4_0-branch)	(revision 205231)
> @@ -97,6 +97,7 @@ along with GCC; see the file COPYING3.
>        NEXT_PASS (pass_feedback_split_functions);
>    POP_INSERT_PASSES ()
>    NEXT_PASS (pass_ipa_increase_alignment);
> +  NEXT_PASS (pass_omp_simd_clone);
>    NEXT_PASS (pass_ipa_tm);
>    NEXT_PASS (pass_ipa_lower_emutls);
>    TERMINATE_PASS_LIST ()

So clones are created before streaming LTO.  You do have vect.exp
testcases that are also run through -flto but does it actually
"work" there?  I remember seeing changes to cgraph unreachable
node removal based on some flag that isn't streamed, no?

> --- gcc/target.def	(.../trunk)	(revision 205223)
> +++ gcc/target.def	(.../branches/gomp-4_0-branch)	(revision 205231)
> @@ -1521,6 +1521,36 @@ hook_int_uint_mode_1)
>  
>  HOOK_VECTOR_END (sched)
>  
> +/* Functions relating to OpenMP and Cilk Plus SIMD clones.  */
> +#undef HOOK_PREFIX
> +#define HOOK_PREFIX "TARGET_SIMD_CLONE_"
> +HOOK_VECTOR (TARGET_SIMD_CLONE, simd_clone)
> +
> +DEFHOOK
> +(compute_vecsize_and_simdlen,
> +"This hook should set @var{vecsize_mangle}, @var{vecsize_int}, @var{vecsize_float}\n\
> +fields in @var{simd_clone} structure pointed by @var{clone_info} argument and also\n\
> +@var{simdlen} field if it was previously 0.\n\
> +The hook should return 0 if SIMD clones shouldn't be emitted,\n\
> +or number of @var{vecsize_mangle} variants that should be emitted.",
> +int, (struct cgraph_node *, struct cgraph_simd_clone *, tree, int), NULL)
> +
> +DEFHOOK
> +(adjust,
> +"This hook should add implicit @code{attribute(target(\"...\"))} attribute\n\
> +to SIMD clone @var{node} if needed.",
> +void, (struct cgraph_node *), NULL)
> +
> +DEFHOOK
> +(usable,
> +"This hook should return -1 if SIMD clone @var{node} shouldn't be used\n\
> +in vectorized loops in current function, or non-negative number if it is\n\
> +usable.  In that case, the smaller the number is, the more desirable it is\n\
> +to use it.",
> +int, (struct cgraph_node *), NULL)
> +
> +HOOK_VECTOR_END (simd_clone)
> +
>  /* Functions relating to vectorization.  */
>  #undef HOOK_PREFIX
>  #define HOOK_PREFIX "TARGET_VECTORIZE_"
> --- gcc/target.h	(.../trunk)	(revision 205223)
> +++ gcc/target.h	(.../branches/gomp-4_0-branch)	(revision 205231)
> @@ -93,6 +93,8 @@ extern bool target_default_pointer_addre
>  struct stdarg_info;
>  struct spec_info_def;
>  struct hard_reg_set_container;
> +struct cgraph_node;
> +struct cgraph_simd_clone;
>  
>  /* The struct used by the secondary_reload target hook.  */
>  typedef struct secondary_reload_info
> --- gcc/tree-core.h	(.../trunk)	(revision 205223)
> +++ gcc/tree-core.h	(.../branches/gomp-4_0-branch)	(revision 205231)
> @@ -903,6 +903,9 @@ struct GTY(()) tree_base {
>         CALL_ALLOCA_FOR_VAR_P in
>             CALL_EXPR
>  
> +       OMP_CLAUSE_LINEAR_VARIABLE_STRIDE in
> +	   OMP_CLAUSE_LINEAR
> +
>     side_effects_flag:
>  
>         TREE_SIDE_EFFECTS in
> --- gcc/tree.h	(.../trunk)	(revision 205223)
> +++ gcc/tree.h	(.../branches/gomp-4_0-branch)	(revision 205231)
> @@ -1346,6 +1351,10 @@ extern void protected_set_expr_location
>  #define OMP_CLAUSE_LINEAR_NO_COPYOUT(NODE) \
>    TREE_PRIVATE (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_LINEAR))
>  
> +/* True if a LINEAR clause has a stride that is variable.  */
> +#define OMP_CLAUSE_LINEAR_VARIABLE_STRIDE(NODE) \
> +  TREE_PROTECTED (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_LINEAR))
> +
>  #define OMP_CLAUSE_LINEAR_STEP(NODE) \
>    OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_LINEAR), 1)
>  
> --- gcc/tree-pass.h	(.../trunk)	(revision 205223)
> +++ gcc/tree-pass.h	(.../branches/gomp-4_0-branch)	(revision 205231)
> @@ -472,6 +472,7 @@ extern ipa_opt_pass_d *make_pass_ipa_ref
>  extern ipa_opt_pass_d *make_pass_ipa_pure_const (gcc::context *ctxt);
>  extern simple_ipa_opt_pass *make_pass_ipa_pta (gcc::context *ctxt);
>  extern simple_ipa_opt_pass *make_pass_ipa_tm (gcc::context *ctxt);
> +extern simple_ipa_opt_pass *make_pass_omp_simd_clone (gcc::context *ctxt);
>  extern ipa_opt_pass_d *make_pass_ipa_profile (gcc::context *ctxt);
>  extern ipa_opt_pass_d *make_pass_ipa_cdtor_merge (gcc::context *ctxt);
>  
> --- gcc/tree-sra.c	(.../trunk)	(revision 205223)
> +++ gcc/tree-sra.c	(.../branches/gomp-4_0-branch)	(revision 205231)
> @@ -4277,9 +4277,10 @@ turn_representatives_into_adjustments (v
>  	  adj.base_index = get_param_index (parm, parms);
>  	  adj.base = parm;
>  	  if (!repr)
> -	    adj.copy_param = 1;
> +	    adj.op = IPA_PARM_OP_COPY;
>  	  else
> -	    adj.remove_param = 1;
> +	    adj.op = IPA_PARM_OP_REMOVE;
> +	  adj.arg_prefix = "ISRA";
>  	  adjustments.quick_push (adj);
>  	}
>        else
> @@ -4299,6 +4300,7 @@ turn_representatives_into_adjustments (v
>  	      adj.by_ref = (POINTER_TYPE_P (TREE_TYPE (repr->base))
>  			    && (repr->grp_maybe_modified
>  				|| repr->grp_not_necessarilly_dereferenced));
> +	      adj.arg_prefix = "ISRA";
>  	      adjustments.quick_push (adj);
>  	    }
>  	}
> @@ -4429,7 +4431,7 @@ get_adjustment_for_base (ipa_parm_adjust
>        struct ipa_parm_adjustment *adj;
>  
>        adj = &adjustments[i];
> -      if (!adj->copy_param && adj->base == base)
> +      if (adj->op != IPA_PARM_OP_COPY && adj->base == base)
>  	return adj;
>      }
>  
> @@ -4493,84 +4495,6 @@ replace_removed_params_ssa_names (gimple
>    return true;
>  }
>  
> -/* If the expression *EXPR should be replaced by a reduction of a parameter, do
> -   so.  ADJUSTMENTS is a pointer to a vector of adjustments.  CONVERT
> -   specifies whether the function should care about type incompatibility the
> -   current and new expressions.  If it is false, the function will leave
> -   incompatibility issues to the caller.  Return true iff the expression
> -   was modified. */
> -
> -static bool
> -sra_ipa_modify_expr (tree *expr, bool convert,
> -		     ipa_parm_adjustment_vec adjustments)
> -{
> -  int i, len;
> -  struct ipa_parm_adjustment *adj, *cand = NULL;
> -  HOST_WIDE_INT offset, size, max_size;
> -  tree base, src;
> -
> -  len = adjustments.length ();
> -
> -  if (TREE_CODE (*expr) == BIT_FIELD_REF
> -      || TREE_CODE (*expr) == IMAGPART_EXPR
> -      || TREE_CODE (*expr) == REALPART_EXPR)
> -    {
> -      expr = &TREE_OPERAND (*expr, 0);
> -      convert = true;
> -    }
> -
> -  base = get_ref_base_and_extent (*expr, &offset, &size, &max_size);
> -  if (!base || size == -1 || max_size == -1)
> -    return false;
> -
> -  if (TREE_CODE (base) == MEM_REF)
> -    {
> -      offset += mem_ref_offset (base).low * BITS_PER_UNIT;
> -      base = TREE_OPERAND (base, 0);
> -    }
> -
> -  base = get_ssa_base_param (base);
> -  if (!base || TREE_CODE (base) != PARM_DECL)
> -    return false;
> -
> -  for (i = 0; i < len; i++)
> -    {
> -      adj = &adjustments[i];
> -
> -      if (adj->base == base
> -	  && (adj->offset == offset || adj->remove_param))
> -	{
> -	  cand = adj;
> -	  break;
> -	}
> -    }
> -  if (!cand || cand->copy_param || cand->remove_param)
> -    return false;
> -
> -  if (cand->by_ref)
> -    src = build_simple_mem_ref (cand->reduction);
> -  else
> -    src = cand->reduction;
> -
> -  if (dump_file && (dump_flags & TDF_DETAILS))
> -    {
> -      fprintf (dump_file, "About to replace expr ");
> -      print_generic_expr (dump_file, *expr, 0);
> -      fprintf (dump_file, " with ");
> -      print_generic_expr (dump_file, src, 0);
> -      fprintf (dump_file, "\n");
> -    }
> -
> -  if (convert && !useless_type_conversion_p (TREE_TYPE (*expr), cand->type))
> -    {
> -      tree vce = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (*expr), src);
> -      *expr = vce;
> -    }
> -  else
> -    *expr = src;
> -  return true;
> -}
> -

Ah, here is the stuff moved from.  I suppose the IPA param re-org
is ok for trunk separately as well.

>  /* If the statement pointed to by STMT_PTR contains any expressions that need
>     to replaced with a different one as noted by ADJUSTMENTS, do so.  Handle any
>     potential type incompatibilities (GSI is used to accommodate conversion
> @@ -4591,8 +4515,8 @@ sra_ipa_modify_assign (gimple *stmt_ptr,
>    rhs_p = gimple_assign_rhs1_ptr (stmt);
>    lhs_p = gimple_assign_lhs_ptr (stmt);
>  
> -  any = sra_ipa_modify_expr (rhs_p, false, adjustments);
> -  any |= sra_ipa_modify_expr (lhs_p, false, adjustments);
> +  any = ipa_modify_expr (rhs_p, false, adjustments);
> +  any |= ipa_modify_expr (lhs_p, false, adjustments);
>    if (any)
>      {
>        tree new_rhs = NULL_TREE;
> @@ -4638,7 +4562,7 @@ sra_ipa_modify_assign (gimple *stmt_ptr,
>  /* Traverse the function body and all modifications as described in
>     ADJUSTMENTS.  Return true iff the CFG has been changed.  */
>  
> -static bool
> +bool
>  ipa_sra_modify_function_body (ipa_parm_adjustment_vec adjustments)
>  {
>    bool cfg_changed = false;
> @@ -4664,7 +4588,7 @@ ipa_sra_modify_function_body (ipa_parm_a
>  	    case GIMPLE_RETURN:
>  	      t = gimple_return_retval_ptr (stmt);
>  	      if (*t != NULL_TREE)
> -		modified |= sra_ipa_modify_expr (t, true, adjustments);
> +		modified |= ipa_modify_expr (t, true, adjustments);
>  	      break;
>  
>  	    case GIMPLE_ASSIGN:
> @@ -4677,13 +4601,13 @@ ipa_sra_modify_function_body (ipa_parm_a
>  	      for (i = 0; i < gimple_call_num_args (stmt); i++)
>  		{
>  		  t = gimple_call_arg_ptr (stmt, i);
> -		  modified |= sra_ipa_modify_expr (t, true, adjustments);
> +		  modified |= ipa_modify_expr (t, true, adjustments);
>  		}
>  
>  	      if (gimple_call_lhs (stmt))
>  		{
>  		  t = gimple_call_lhs_ptr (stmt);
> -		  modified |= sra_ipa_modify_expr (t, false, adjustments);
> +		  modified |= ipa_modify_expr (t, false, adjustments);
>  		  modified |= replace_removed_params_ssa_names (stmt,
>  								adjustments);
>  		}
> @@ -4693,12 +4617,12 @@ ipa_sra_modify_function_body (ipa_parm_a
>  	      for (i = 0; i < gimple_asm_ninputs (stmt); i++)
>  		{
>  		  t = &TREE_VALUE (gimple_asm_input_op (stmt, i));
> -		  modified |= sra_ipa_modify_expr (t, true, adjustments);
> +		  modified |= ipa_modify_expr (t, true, adjustments);
>  		}
>  	      for (i = 0; i < gimple_asm_noutputs (stmt); i++)
>  		{
>  		  t = &TREE_VALUE (gimple_asm_output_op (stmt, i));
> -		  modified |= sra_ipa_modify_expr (t, false, adjustments);
> +		  modified |= ipa_modify_expr (t, false, adjustments);
>  		}
>  	      break;
>  
> @@ -4744,7 +4668,7 @@ sra_ipa_reset_debug_stmts (ipa_parm_adju
>        use_operand_p use_p;
>  
>        adj = &adjustments[i];
> -      if (adj->copy_param || !is_gimple_reg (adj->base))
> +      if (adj->op == IPA_PARM_OP_COPY || !is_gimple_reg (adj->base))
>  	continue;
>        name = ssa_default_def (cfun, adj->base);
>        vexpr = NULL;
> @@ -4927,7 +4851,7 @@ modify_function (struct cgraph_node *nod
>    redirect_callers.release ();
>  
>    push_cfun (DECL_STRUCT_FUNCTION (new_node->decl));
> -  ipa_modify_formal_parameters (current_function_decl, adjustments, "ISRA");
> +  ipa_modify_formal_parameters (current_function_decl, adjustments);
>    cfg_changed = ipa_sra_modify_function_body (adjustments);
>    sra_ipa_reset_debug_stmts (adjustments);
>    convert_callers (new_node, node->decl, adjustments);
> --- gcc/tree-vect-data-refs.c	(.../trunk)	(revision 205223)
> +++ gcc/tree-vect-data-refs.c	(.../branches/gomp-4_0-branch)	(revision 205231)
> @@ -49,6 +49,7 @@ along with GCC; see the file COPYING3.
>  #include "tree-scalar-evolution.h"
>  #include "tree-vectorizer.h"
>  #include "diagnostic-core.h"
> +#include "cgraph.h"
>  /* Need to include rtl.h, expr.h, etc. for optabs.  */
>  #include "expr.h"
>  #include "optabs.h"
> @@ -3163,10 +3164,11 @@ vect_analyze_data_refs (loop_vec_info lo
>  
>    if (loop_vinfo)
>      {
> +      basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
> +
>        loop = LOOP_VINFO_LOOP (loop_vinfo);
> -      if (!find_loop_nest (loop, &LOOP_VINFO_LOOP_NEST (loop_vinfo))
> -	  || find_data_references_in_loop
> -	       (loop, &LOOP_VINFO_DATAREFS (loop_vinfo)))
> +      datarefs = LOOP_VINFO_DATAREFS (loop_vinfo);
> +      if (!find_loop_nest (loop, &LOOP_VINFO_LOOP_NEST (loop_vinfo)))
>  	{
>  	  if (dump_enabled_p ())
>  	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> @@ -3175,7 +3177,57 @@ vect_analyze_data_refs (loop_vec_info lo
>  	  return false;
>  	}
>  
> -      datarefs = LOOP_VINFO_DATAREFS (loop_vinfo);
> +      for (i = 0; i < loop->num_nodes; i++)
> +	{
> +	  gimple_stmt_iterator gsi;
> +
> +	  for (gsi = gsi_start_bb (bbs[i]); !gsi_end_p (gsi); gsi_next (&gsi))
> +	    {
> +	      gimple stmt = gsi_stmt (gsi);
> +	      if (!find_data_references_in_stmt (loop, stmt, &datarefs))
> +		{
> +		  if (is_gimple_call (stmt) && loop->safelen)
> +		    {
> +		      tree fndecl = gimple_call_fndecl (stmt), op;
> +		      if (fndecl != NULL_TREE)
> +			{
> +			  struct cgraph_node *node = cgraph_get_node (fndecl);
> +			  if (node != NULL && node->simd_clones != NULL)

So you use node->simd_clones which also need LTO streaming.

What's the reason you cannot defer SIMD cloning to LTRANS stage
as simple IPA pass next to IPA-PTA?

> +			    {
> +			      unsigned int j, n = gimple_call_num_args (stmt);
> +			      for (j = 0; j < n; j++)
> +				{
> +				  op = gimple_call_arg (stmt, j);
> +				  if (DECL_P (op)
> +				      || (REFERENCE_CLASS_P (op)
> +					  && get_base_address (op)))
> +				    break;
> +				}
> +			      op = gimple_call_lhs (stmt);
> +			      /* Ignore #pragma omp declare simd functions
> +				 if they don't have data references in the
> +				 call stmt itself.  */
> +			      if (j == n
> +				  && !(op
> +				       && (DECL_P (op)
> +					   || (REFERENCE_CLASS_P (op)
> +					       && get_base_address (op)))))
> +				continue;

Hmm.  I guess I have an idea now how to "better" support calls in
data-ref/dependence analysis.  The above is fine for now - you
might want to dump sth here if you fail because datarefs in a declare
simd fn call.

> +			    }
> +			}
> +		    }
> +		  LOOP_VINFO_DATAREFS (loop_vinfo) = datarefs;
> +		  if (dump_enabled_p ())
> +		    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +				     "not vectorized: loop contains function "
> +				     "calls or data references that cannot "
> +				     "be analyzed\n");
> +		  return false;
> +		}
> +	    }
> +	}
> +
> +      LOOP_VINFO_DATAREFS (loop_vinfo) = datarefs;
>      }
>    else
>      {
> --- gcc/tree-vect-loop.c	(.../trunk)	(revision 205223)
> +++ gcc/tree-vect-loop.c	(.../branches/gomp-4_0-branch)	(revision 205231)
> @@ -373,6 +373,36 @@ vect_determine_vectorization_factor (loo
>  
>  	  if (gimple_get_lhs (stmt) == NULL_TREE)
>  	    {
> +	      if (is_gimple_call (stmt))
> +		{
> +		  /* Ignore calls with no lhs.  These must be calls to
> +		     #pragma omp simd functions, and what vectorization factor
> +		     it really needs can't be determined until
> +		     vectorizable_simd_clone_call.  */

Ick - that's bad.  Well, or rather it doesn't participate in
vectorization factor determining then, resulting in missed
vectorizations eventually.  You basically say "any vect factor is ok"
here?

> +		  if (STMT_VINFO_VECTYPE (stmt_info) == NULL_TREE)
> +		    {
> +		      unsigned int j, n = gimple_call_num_args (stmt);
> +		      for (j = 0; j < n; j++)
> +			{
> +			  scalar_type = TREE_TYPE (gimple_call_arg (stmt, j));
> +			  vectype = get_vectype_for_scalar_type (scalar_type);
> +			  if (vectype)
> +			    {
> +			      STMT_VINFO_VECTYPE (stmt_info) = vectype;
> +			      break;
> +			    }
> +			}
> +		    }
> +		  if (STMT_VINFO_VECTYPE (stmt_info) != NULL_TREE)
> +		    {
> +		      if (!analyze_pattern_stmt && gsi_end_p (pattern_def_si))
> +			{
> +			  pattern_def_seq = NULL;
> +			  gsi_next (&si);
> +			}
> +		      continue;
> +		    }

Both cases above need comments - why do you chose the first param
for determining STMT_VINFO_VECTYPE?  Isn't STMT_VINFO_VECTYPE
completely irrelevant for calls w/o LHS?  Answer: yes it is!

I'd have expected an unconditional continue here (and leave
STMT_VINFO_VECTYPE == NULL - fact is that the vector type of
the argument is determined by its definition and thus may
be different from what you record here anyway).

> +		}
>  	      if (dump_enabled_p ())
>  		{
>  	          dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> --- gcc/tree-vectorizer.h	(.../trunk)	(revision 205223)
> +++ gcc/tree-vectorizer.h	(.../branches/gomp-4_0-branch)	(revision 205231)
> @@ -443,6 +443,7 @@ enum stmt_vec_info_type {
>    shift_vec_info_type,
>    op_vec_info_type,
>    call_vec_info_type,
> +  call_simd_clone_vec_info_type,
>    assignment_vec_info_type,
>    condition_vec_info_type,
>    reduc_vec_info_type,
> --- gcc/tree-vect-stmts.c	(.../trunk)	(revision 205223)
> +++ gcc/tree-vect-stmts.c	(.../branches/gomp-4_0-branch)	(revision 205231)
> @@ -42,12 +42,15 @@ along with GCC; see the file COPYING3.
>  #include "tree-ssanames.h"
>  #include "tree-ssa-loop-manip.h"
>  #include "cfgloop.h"
> +#include "tree-ssa-loop.h"
> +#include "tree-scalar-evolution.h"
>  #include "expr.h"
>  #include "recog.h"		/* FIXME: for insn_data */
>  #include "optabs.h"
>  #include "diagnostic-core.h"
>  #include "tree-vectorizer.h"
>  #include "dumpfile.h"
> +#include "cgraph.h"
>  
>  /* For lang_hooks.types.type_for_mode.  */
>  #include "langhooks.h"
> @@ -1736,7 +1739,8 @@ vectorizable_call (gimple stmt, gimple_s
>    if (!is_gimple_call (stmt))
>      return false;
>  
> -  if (TREE_CODE (gimple_call_lhs (stmt)) != SSA_NAME)
> +  if (gimple_call_lhs (stmt) == NULL_TREE
> +      || TREE_CODE (gimple_call_lhs (stmt)) != SSA_NAME)
>      return false;
>  
>    if (stmt_can_throw_internal (stmt))
> @@ -2114,6 +2118,603 @@ vectorizable_call (gimple stmt, gimple_s
>  }
>  
>  
> +struct simd_call_arg_info
> +{
> +  tree vectype;
> +  tree op;
> +  enum vect_def_type dt;
> +  HOST_WIDE_INT linear_step;
> +  unsigned int align;
> +};
> +
> +/* Function vectorizable_simd_clone_call.
> +
> +   Check if STMT performs a function call that can be vectorized
> +   by calling a simd clone of the function.
> +   If VEC_STMT is also passed, vectorize the STMT: create a vectorized
> +   stmt to replace it, put it in VEC_STMT, and insert it at BSI.
> +   Return FALSE if not a vectorizable STMT, TRUE otherwise.  */
> +
> +static bool
> +vectorizable_simd_clone_call (gimple stmt, gimple_stmt_iterator *gsi,
> +			      gimple *vec_stmt, slp_tree slp_node)
> +{
> +  tree vec_dest;
> +  tree scalar_dest;
> +  tree op, type;
> +  tree vec_oprnd0 = NULL_TREE;
> +  stmt_vec_info stmt_info = vinfo_for_stmt (stmt), prev_stmt_info;
> +  tree vectype;
> +  unsigned int nunits;
> +  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
> +  bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
> +  struct loop *loop = loop_vinfo ? LOOP_VINFO_LOOP (loop_vinfo) : NULL;
> +  tree fndecl, new_temp, def;
> +  gimple def_stmt;
> +  gimple new_stmt = NULL;
> +  int ncopies, j;
> +  vec<simd_call_arg_info> arginfo = vNULL;
> +  vec<tree> vargs = vNULL;
> +  size_t i, nargs;
> +  tree lhs, rtype, ratype;
> +  vec<constructor_elt, va_gc> *ret_ctor_elts;
> +
> +  /* Is STMT a vectorizable call?   */
> +  if (!is_gimple_call (stmt))
> +    return false;
> +
> +  fndecl = gimple_call_fndecl (stmt);
> +  if (fndecl == NULL_TREE)
> +    return false;
> +
> +  struct cgraph_node *node = cgraph_get_node (fndecl);
> +  if (node == NULL || node->simd_clones == NULL)
> +    return false;
> +
> +  if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
> +    return false;
> +  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def)
> +    return false;
> +  if (gimple_call_lhs (stmt)
> +      && TREE_CODE (gimple_call_lhs (stmt)) != SSA_NAME)
> +    return false;
> +
> +  if (stmt_can_throw_internal (stmt))
> +    return false;

Can't happen (loop form checks).

> +
> +  vectype = STMT_VINFO_VECTYPE (stmt_info);

See above - questionable if this doesn't result from looking at
the LHS.

> +  if (loop_vinfo && nested_in_vect_loop_p (loop, stmt))
> +    return false;
> +
> +  /* FORNOW */
> +  if (slp_node || PURE_SLP_STMT (stmt_info))
> +    return false;
> +
> +  /* Process function arguments.  */
> +  nargs = gimple_call_num_args (stmt);
> +
> +  /* Bail out if the function has zero arguments.  */
> +  if (nargs == 0)
> +    return false;
> +
> +  arginfo.create (nargs);
> +
> +  for (i = 0; i < nargs; i++)
> +    {
> +      simd_call_arg_info thisarginfo;
> +      affine_iv iv;
> +
> +      thisarginfo.linear_step = 0;
> +      thisarginfo.align = 0;
> +      thisarginfo.op = NULL_TREE;
> +
> +      op = gimple_call_arg (stmt, i);
> +      if (!vect_is_simple_use_1 (op, stmt, loop_vinfo, bb_vinfo,
> +				 &def_stmt, &def, &thisarginfo.dt,
> +				 &thisarginfo.vectype))
> +	{
> +	  if (dump_enabled_p ())
> +	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			     "use not simple.\n");
> +	  arginfo.release ();
> +	  return false;
> +	}
> +
> +      if (thisarginfo.vectype != NULL_TREE
> +	  && loop_vinfo
> +	  && TREE_CODE (op) == SSA_NAME
> +	  && simple_iv (loop, loop_containing_stmt (stmt), op, &iv, false)
> +	  && tree_fits_shwi_p (iv.step))
> +	{
> +	  thisarginfo.linear_step = tree_to_shwi (iv.step);

Hmm, you should check thisarginfo.dt instead (I assume this case
is for induction/reduction defs)?  In this case you also should
use STMT_VINFO_LOOP_PHI_EVOLUTION_PART and not re-analyze via simple_iv.

> +	  thisarginfo.op = iv.base;
> +	}
> +      else if (thisarginfo.vectype == NULL_TREE
> +	       && POINTER_TYPE_P (TREE_TYPE (op)))
> +	thisarginfo.align = get_pointer_alignment (op) / BITS_PER_UNIT;

So this is for dt_external defs?

Please switch on thisarginfo.dt here - that more naturally explains
what you are doing (otherwise this definitely misses a comment).

> +
> +      arginfo.quick_push (thisarginfo);
> +    }
> +
> +  unsigned int badness = 0;
> +  struct cgraph_node *bestn = NULL;
> +  for (struct cgraph_node *n = node->simd_clones; n != NULL;
> +       n = n->simdclone->next_clone)
> +    {
> +      unsigned int this_badness = 0;
> +      if (n->simdclone->simdlen
> +	  > (unsigned) LOOP_VINFO_VECT_FACTOR (loop_vinfo)
> +	  || n->simdclone->nargs != nargs)
> +	continue;
> +      if (n->simdclone->simdlen
> +	  < (unsigned) LOOP_VINFO_VECT_FACTOR (loop_vinfo))
> +	this_badness += (exact_log2 (LOOP_VINFO_VECT_FACTOR (loop_vinfo))
> +			 - exact_log2 (n->simdclone->simdlen)) * 1024;
> +      if (n->simdclone->inbranch)
> +	this_badness += 2048;
> +      int target_badness = targetm.simd_clone.usable (n);
> +      if (target_badness < 0)
> +	continue;
> +      this_badness += target_badness * 512;
> +      /* FORNOW: Have to add code to add the mask argument.  */
> +      if (n->simdclone->inbranch)
> +	continue;

We don't support if-converting calls anyway, no?

> +      for (i = 0; i < nargs; i++)
> +	{
> +	  switch (n->simdclone->args[i].arg_type)
> +	    {
> +	    case SIMD_CLONE_ARG_TYPE_VECTOR:
> +	      if (!useless_type_conversion_p
> +		     (n->simdclone->args[i].orig_type,
> +		      TREE_TYPE (gimple_call_arg (stmt, i))))
> +		i = -1;

But you don't verify the vectype against the clone vectype?

> +	      else if (arginfo[i].vectype == NULL_TREE

I'd like to see checks based on the def type, not vectype.

> +		       || arginfo[i].linear_step)
> +		this_badness += 64;
> +	      break;
> +	    case SIMD_CLONE_ARG_TYPE_UNIFORM:
> +	      if (arginfo[i].vectype != NULL_TREE)

Likewise (and below, too).

> +		i = -1;
> +	      break;
> +	    case SIMD_CLONE_ARG_TYPE_LINEAR_CONSTANT_STEP:
> +	      if (arginfo[i].vectype == NULL_TREE
> +		  || (arginfo[i].linear_step
> +		      != n->simdclone->args[i].linear_step))
> +		i = -1;
> +	      break;
> +	    case SIMD_CLONE_ARG_TYPE_LINEAR_VARIABLE_STEP:
> +	      /* FORNOW */
> +	      i = -1;
> +	      break;
> +	    case SIMD_CLONE_ARG_TYPE_MASK:
> +	      gcc_unreachable ();
> +	    }
> +	  if (i == (size_t) -1)
> +	    break;
> +	  if (n->simdclone->args[i].alignment > arginfo[i].align)
> +	    {
> +	      i = -1;
> +	      break;
> +	    }
> +	  if (arginfo[i].align)
> +	    this_badness += (exact_log2 (arginfo[i].align)
> +			     - exact_log2 (n->simdclone->args[i].alignment));
> +	}
> +      if (i == (size_t) -1)
> +	continue;
> +      if (bestn == NULL || this_badness < badness)
> +	{
> +	  bestn = n;
> +	  badness = this_badness;
> +	}
> +    }
> +
> +  if (bestn == NULL)
> +    {
> +      arginfo.release ();
> +      return false;
> +    }
> +
> +  for (i = 0; i < nargs; i++)
> +    if (arginfo[i].vectype == NULL_TREE
> +	&& bestn->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_VECTOR)
> +      {
> +	arginfo[i].vectype
> +	  = get_vectype_for_scalar_type (TREE_TYPE (gimple_call_arg (stmt,
> +								     i)));
> +	if (arginfo[i].vectype == NULL
> +	    || (TYPE_VECTOR_SUBPARTS (arginfo[i].vectype)
> +		> bestn->simdclone->simdlen))
> +	  {
> +	    arginfo.release ();
> +	    return false;
> +	  }
> +      }
> +
> +  fndecl = bestn->decl;
> +  nunits = bestn->simdclone->simdlen;
> +  ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits;
> +
> +  /* If the function isn't const, only allow it in simd loops where user
> +     has asserted that at least nunits consecutive iterations can be
> +     performed using SIMD instructions.  */
> +  if ((loop == NULL || (unsigned) loop->safelen < nunits)
> +      && gimple_vuse (stmt))
> +    {
> +      arginfo.release ();
> +      return false;
> +    }
> +
> +  /* Sanity check: make sure that at least one copy of the vectorized stmt
> +     needs to be generated.  */
> +  gcc_assert (ncopies >= 1);
> +
> +  if (!vec_stmt) /* transformation not required.  */
> +    {
> +      STMT_VINFO_TYPE (stmt_info) = call_simd_clone_vec_info_type;
> +      if (dump_enabled_p ())
> +	dump_printf_loc (MSG_NOTE, vect_location,
> +			 "=== vectorizable_simd_clone_call ===\n");
> +/*      vect_model_simple_cost (stmt_info, ncopies, dt, NULL, NULL); */
> +      arginfo.release ();

Please save the result from the analysis (selecting the simd clone)
in the stmt_vinfo and skip the analysis during transform phase.

> +      return true;
> +    }
> +
> +  /** Transform.  **/
> +
> +  if (dump_enabled_p ())
> +    dump_printf_loc (MSG_NOTE, vect_location, "transform call.\n");
> +
> +  /* Handle def.  */
> +  scalar_dest = gimple_call_lhs (stmt);
> +  vec_dest = NULL_TREE;
> +  rtype = NULL_TREE;
> +  ratype = NULL_TREE;
> +  if (scalar_dest)
> +    {
> +      vec_dest = vect_create_destination_var (scalar_dest, vectype);
> +      rtype = TREE_TYPE (TREE_TYPE (fndecl));
> +      if (TREE_CODE (rtype) == ARRAY_TYPE)
> +	{
> +	  ratype = rtype;
> +	  rtype = TREE_TYPE (ratype);
> +	}
> +    }
> +
> +  prev_stmt_info = NULL;
> +  for (j = 0; j < ncopies; ++j)
> +    {
> +      /* Build argument list for the vectorized call.  */
> +      if (j == 0)
> +	vargs.create (nargs);
> +      else
> +	vargs.truncate (0);
> +
> +      for (i = 0; i < nargs; i++)
> +	{
> +	  unsigned int k, l, m, o;
> +	  tree atype;
> +	  op = gimple_call_arg (stmt, i);
> +	  switch (bestn->simdclone->args[i].arg_type)
> +	    {
> +	    case SIMD_CLONE_ARG_TYPE_VECTOR:
> +	      atype = bestn->simdclone->args[i].vector_type;
> +	      o = nunits / TYPE_VECTOR_SUBPARTS (atype);
> +	      for (m = j * o; m < (j + 1) * o; m++)
> +		{
> +		  if (TYPE_VECTOR_SUBPARTS (atype)
> +		      < TYPE_VECTOR_SUBPARTS (arginfo[i].vectype))
> +		    {
> +		      unsigned int prec = GET_MODE_BITSIZE (TYPE_MODE (atype));
> +		      k = (TYPE_VECTOR_SUBPARTS (arginfo[i].vectype)
> +			   / TYPE_VECTOR_SUBPARTS (atype));
> +		      gcc_assert ((k & (k - 1)) == 0);
> +		      if (m == 0)
> +			vec_oprnd0
> +			  = vect_get_vec_def_for_operand (op, stmt, NULL);
> +		      else
> +			{
> +			  vec_oprnd0 = arginfo[i].op;
> +			  if ((m & (k - 1)) == 0)
> +			    vec_oprnd0
> +			      = vect_get_vec_def_for_stmt_copy (arginfo[i].dt,
> +								vec_oprnd0);
> +			}
> +		      arginfo[i].op = vec_oprnd0;
> +		      vec_oprnd0
> +			= build3 (BIT_FIELD_REF, atype, vec_oprnd0,
> +				  build_int_cst (integer_type_node, prec),
> +				  build_int_cst (integer_type_node,
> +						 (m & (k - 1)) * prec));

Some helpers to build the tree to select a sub-vector would be nice
(I remember seeing this kind of pattern elsewhere).

> +		      new_stmt
> +			= gimple_build_assign_with_ops (BIT_FIELD_REF,
> +							make_ssa_name (atype,
> +								       NULL),
> +							vec_oprnd0, NULL_TREE);
> +		      vect_finish_stmt_generation (stmt, new_stmt, gsi);
> +		      vargs.safe_push (gimple_assign_lhs (new_stmt));
> +		    }
> +		  else
> +		    {
> +		      k = (TYPE_VECTOR_SUBPARTS (atype)
> +			   / TYPE_VECTOR_SUBPARTS (arginfo[i].vectype));
> +		      gcc_assert ((k & (k - 1)) == 0);
> +		      vec<constructor_elt, va_gc> *ctor_elts;
> +		      if (k != 1)
> +			vec_alloc (ctor_elts, k);
> +		      else
> +			ctor_elts = NULL;
> +		      for (l = 0; l < k; l++)
> +			{
> +			  if (m == 0 && l == 0)
> +			    vec_oprnd0
> +			      = vect_get_vec_def_for_operand (op, stmt, NULL);
> +			  else
> +			    vec_oprnd0
> +			      = vect_get_vec_def_for_stmt_copy (arginfo[i].dt,
> +								arginfo[i].op);
> +			  arginfo[i].op = vec_oprnd0;
> +			  if (k == 1)
> +			    break;
> +			  CONSTRUCTOR_APPEND_ELT (ctor_elts, NULL_TREE,
> +						  vec_oprnd0);
> +			}
> +		      if (k == 1)
> +			vargs.safe_push (vec_oprnd0);
> +		      else
> +			{
> +			  vec_oprnd0 = build_constructor (atype, ctor_elts);
> +			  new_stmt
> +			    = gimple_build_assign_with_ops
> +				(CONSTRUCTOR, make_ssa_name (atype, NULL),
> +				 vec_oprnd0, NULL_TREE);
> +			  vect_finish_stmt_generation (stmt, new_stmt, gsi);
> +			  vargs.safe_push (gimple_assign_lhs (new_stmt));
> +			}
> +		    }
> +		}
> +	      break;
> +	    case SIMD_CLONE_ARG_TYPE_UNIFORM:
> +	      vargs.safe_push (op);
> +	      break;
> +	    case SIMD_CLONE_ARG_TYPE_LINEAR_CONSTANT_STEP:
> +	      if (j == 0)
> +		{
> +		  gimple_seq stmts;
> +		  arginfo[i].op
> +		    = force_gimple_operand (arginfo[i].op, &stmts, true,
> +					    NULL_TREE);
> +		  if (stmts != NULL)
> +		    {
> +		      basic_block new_bb;
> +		      edge pe = loop_preheader_edge (loop);
> +		      new_bb = gsi_insert_seq_on_edge_immediate (pe, stmts);
> +		      gcc_assert (!new_bb);
> +		    }
> +		  tree phi_res = copy_ssa_name (op, NULL);
> +		  gimple new_phi = create_phi_node (phi_res, loop->header);
> +		  set_vinfo_for_stmt (new_phi,
> +				      new_stmt_vec_info (new_phi, loop_vinfo,
> +							 NULL));
> +		  add_phi_arg (new_phi, arginfo[i].op,
> +			       loop_preheader_edge (loop), UNKNOWN_LOCATION);
> +		  enum tree_code code
> +		    = POINTER_TYPE_P (TREE_TYPE (op))
> +		      ? POINTER_PLUS_EXPR : PLUS_EXPR;
> +		  tree type = POINTER_TYPE_P (TREE_TYPE (op))
> +			      ? sizetype : TREE_TYPE (op);
> +		  double_int cst
> +		    = double_int::from_shwi (arginfo[i].linear_step);
> +		  cst *= double_int::from_uhwi (ncopies * nunits);
> +		  tree tcst = double_int_to_tree (type, cst);
> +		  tree phi_arg = copy_ssa_name (op, NULL);
> +		  new_stmt = gimple_build_assign_with_ops (code, phi_arg,
> +							   phi_res, tcst);
> +		  gimple_stmt_iterator si = gsi_after_labels (loop->header);
> +		  gsi_insert_after (&si, new_stmt, GSI_NEW_STMT);
> +		  set_vinfo_for_stmt (new_stmt,
> +				      new_stmt_vec_info (new_stmt, loop_vinfo,
> +							 NULL));
> +		  add_phi_arg (new_phi, phi_arg, loop_latch_edge (loop),
> +			       UNKNOWN_LOCATION);
> +		  arginfo[i].op = phi_res;
> +		  vargs.safe_push (phi_res);
> +		}
> +	      else
> +		{
> +		  enum tree_code code
> +		    = POINTER_TYPE_P (TREE_TYPE (op))
> +		      ? POINTER_PLUS_EXPR : PLUS_EXPR;
> +		  tree type = POINTER_TYPE_P (TREE_TYPE (op))
> +			      ? sizetype : TREE_TYPE (op);
> +		  double_int cst
> +		    = double_int::from_shwi (arginfo[i].linear_step);
> +		  cst *= double_int::from_uhwi (j * nunits);
> +		  tree tcst = double_int_to_tree (type, cst);
> +		  new_temp = make_ssa_name (TREE_TYPE (op), NULL);
> +		  new_stmt
> +		    = gimple_build_assign_with_ops (code, new_temp,
> +						    arginfo[i].op, tcst);
> +		  vect_finish_stmt_generation (stmt, new_stmt, gsi);
> +		  vargs.safe_push (new_temp);
> +		}
> +	      break;
> +	    case SIMD_CLONE_ARG_TYPE_LINEAR_VARIABLE_STEP:
> +	    default:
> +	      gcc_unreachable ();
> +	    }
> +	}
> +
> +      new_stmt = gimple_build_call_vec (fndecl, vargs);
> +      if (vec_dest)
> +	{
> +	  gcc_assert (ratype || TYPE_VECTOR_SUBPARTS (rtype) == nunits);
> +	  if (ratype)
> +	    new_temp = create_tmp_var (ratype, NULL);
> +	  else if (TYPE_VECTOR_SUBPARTS (vectype)
> +		   == TYPE_VECTOR_SUBPARTS (rtype))
> +	    new_temp = make_ssa_name (vec_dest, new_stmt);
> +	  else
> +	    new_temp = make_ssa_name (rtype, new_stmt);
> +	  gimple_call_set_lhs (new_stmt, new_temp);
> +	}
> +      vect_finish_stmt_generation (stmt, new_stmt, gsi);
> +
> +      if (vec_dest)
> +	{
> +	  if (TYPE_VECTOR_SUBPARTS (vectype) < nunits)
> +	    {
> +	      unsigned int k, l;
> +	      unsigned int prec = GET_MODE_BITSIZE (TYPE_MODE (vectype));
> +	      k = nunits / TYPE_VECTOR_SUBPARTS (vectype);
> +	      gcc_assert ((k & (k - 1)) == 0);
> +	      for (l = 0; l < k; l++)
> +		{
> +		  tree t;
> +		  if (ratype)
> +		    {
> +		      t = build_fold_addr_expr (new_temp);
> +		      t = build2 (MEM_REF, vectype, t,
> +				  build_int_cst (TREE_TYPE (t),
> +						 l * prec / BITS_PER_UNIT));
> +		    }
> +		  else
> +		    t = build3 (BIT_FIELD_REF, vectype, new_temp,
> +				build_int_cst (integer_type_node, prec),
> +				build_int_cst (integer_type_node, l * prec));
> +		  new_stmt
> +		    = gimple_build_assign_with_ops (TREE_CODE (t),
> +						    make_ssa_name (vectype,
> +								   NULL),
> +						    t, NULL_TREE);

For SINGLE_RHS assigns I prefer gimple_build_assign.

> +		  vect_finish_stmt_generation (stmt, new_stmt, gsi);
> +		  if (j == 0 && l == 0)
> +		    STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt;
> +		  else
> +		    STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
> +
> +		  prev_stmt_info = vinfo_for_stmt (new_stmt);
> +		}
> +
> +	      if (ratype)
> +		{
> +		  tree clobber = build_constructor (ratype, NULL);
> +		  TREE_THIS_VOLATILE (clobber) = 1;
> +		  new_stmt = gimple_build_assign (new_temp, clobber);
> +		  vect_finish_stmt_generation (stmt, new_stmt, gsi);
> +		}
> +	      continue;
> +	    }
> +	  else if (TYPE_VECTOR_SUBPARTS (vectype) > nunits)
> +	    {
> +	      unsigned int k = (TYPE_VECTOR_SUBPARTS (vectype)
> +				/ TYPE_VECTOR_SUBPARTS (rtype));
> +	      gcc_assert ((k & (k - 1)) == 0);
> +	      if ((j & (k - 1)) == 0)
> +		vec_alloc (ret_ctor_elts, k);
> +	      if (ratype)
> +		{
> +		  unsigned int m, o = nunits / TYPE_VECTOR_SUBPARTS (rtype);
> +		  for (m = 0; m < o; m++)
> +		    {
> +		      tree tem = build4 (ARRAY_REF, rtype, new_temp,
> +					 size_int (m), NULL_TREE, NULL_TREE);
> +		      new_stmt
> +			= gimple_build_assign_with_ops (ARRAY_REF, rtype,
> +							make_ssa_name (rtype,
> +								       NULL),
> +							tem);
> +		      vect_finish_stmt_generation (stmt, new_stmt, gsi);
> +		      CONSTRUCTOR_APPEND_ELT (ret_ctor_elts, NULL_TREE, tem);
> +		    }
> +		  tree clobber = build_constructor (ratype, NULL);
> +		  TREE_THIS_VOLATILE (clobber) = 1;
> +		  new_stmt = gimple_build_assign (new_temp, clobber);
> +		  vect_finish_stmt_generation (stmt, new_stmt, gsi);
> +		}
> +	      else
> +		CONSTRUCTOR_APPEND_ELT (ret_ctor_elts, NULL_TREE, new_temp);
> +	      if ((j & (k - 1)) != k - 1)
> +		continue;
> +	      vec_oprnd0 = build_constructor (vectype, ret_ctor_elts);
> +	      new_stmt
> +		= gimple_build_assign_with_ops (CONSTRUCTOR,
> +						make_ssa_name (vec_dest, NULL),
> +						vec_oprnd0, NULL_TREE);
> +	      vect_finish_stmt_generation (stmt, new_stmt, gsi);
> +
> +	      if ((unsigned) j == k - 1)
> +		STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt;
> +	      else
> +		STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
> +
> +	      prev_stmt_info = vinfo_for_stmt (new_stmt);
> +	      continue;
> +	    }
> +	  else if (ratype)
> +	    {
> +	      tree t = build_fold_addr_expr (new_temp);
> +	      t = build2 (MEM_REF, vectype, t,
> +			  build_int_cst (TREE_TYPE (t), 0));
> +	      new_stmt
> +		= gimple_build_assign_with_ops (MEM_REF, vectype,
> +						make_ssa_name (vec_dest,
> +							       NULL), t);
> +	      vect_finish_stmt_generation (stmt, new_stmt, gsi);
> +	      tree clobber = build_constructor (ratype, NULL);
> +	      TREE_THIS_VOLATILE (clobber) = 1;
> +	      vect_finish_stmt_generation (stmt,
> +					   gimple_build_assign (new_temp,
> +								clobber), gsi);
> +	    }
> +	}
> +
> +      if (j == 0)
> +	STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt;
> +      else
> +	STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
> +
> +      prev_stmt_info = vinfo_for_stmt (new_stmt);
> +    }
> +
> +  vargs.release ();
> +
> +  /* Update the exception handling table with the vector stmt if necessary.  */
> +  if (maybe_clean_or_replace_eh_stmt (stmt, *vec_stmt))
> +    gimple_purge_dead_eh_edges (gimple_bb (stmt));

But you've early-outed on throwing stmts?  Generally this shouldn't 
happen.

> +  /* The call in STMT might prevent it from being removed in dce.
> +     We however cannot remove it here, due to the way the ssa name
> +     it defines is mapped to the new definition.  So just replace
> +     rhs of the statement with something harmless.  */
> +
> +  if (slp_node)
> +    return true;
> +
> +  if (scalar_dest)
> +    {
> +      type = TREE_TYPE (scalar_dest);
> +      if (is_pattern_stmt_p (stmt_info))
> +	lhs = gimple_call_lhs (STMT_VINFO_RELATED_STMT (stmt_info));
> +      else
> +	lhs = gimple_call_lhs (stmt);
> +      new_stmt = gimple_build_assign (lhs, build_zero_cst (type));
> +    }
> +  else
> +    new_stmt = gimple_build_nop ();
> +  set_vinfo_for_stmt (new_stmt, stmt_info);
> +  set_vinfo_for_stmt (stmt, NULL);
> +  STMT_VINFO_STMT (stmt_info) = new_stmt;
> +  gsi_replace (gsi, new_stmt, false);
> +  unlink_stmt_vdef (stmt);
> +
> +  return true;
> +}
> +
> +
>  /* Function vect_gen_widened_results_half
>  
>     Create a vector stmt whose code, type, number of arguments, and result
> @@ -5838,6 +6439,7 @@ vect_analyze_stmt (gimple stmt, bool *ne
>              || vectorizable_assignment (stmt, NULL, NULL, NULL)
>              || vectorizable_load (stmt, NULL, NULL, NULL, NULL)
>  	    || vectorizable_call (stmt, NULL, NULL, NULL)
> +	    || vectorizable_simd_clone_call (stmt, NULL, NULL, NULL)
>              || vectorizable_store (stmt, NULL, NULL, NULL)
>              || vectorizable_reduction (stmt, NULL, NULL, NULL)
>              || vectorizable_condition (stmt, NULL, NULL, NULL, 0, NULL));
> @@ -5850,6 +6452,7 @@ vect_analyze_stmt (gimple stmt, bool *ne
>                  || vectorizable_assignment (stmt, NULL, NULL, node)
>                  || vectorizable_load (stmt, NULL, NULL, node, NULL)
>  		|| vectorizable_call (stmt, NULL, NULL, node)
> +		|| vectorizable_simd_clone_call (stmt, NULL, NULL, node)
>                  || vectorizable_store (stmt, NULL, NULL, node)
>                  || vectorizable_condition (stmt, NULL, NULL, NULL, 0, node));
>        }
> @@ -5972,6 +6575,11 @@ vect_transform_stmt (gimple stmt, gimple
>        stmt = gsi_stmt (*gsi);
>        break;
>  
> +    case call_simd_clone_vec_info_type:
> +      done = vectorizable_simd_clone_call (stmt, gsi, &vec_stmt, slp_node);
> +      stmt = gsi_stmt (*gsi);
> +      break;
> +
>      case reduc_vec_info_type:
>        done = vectorizable_reduction (stmt, gsi, &vec_stmt, slp_node);
>        gcc_assert (done);
> --- gcc/c/c-decl.c	(.../trunk)	(revision 205223)
> +++ gcc/c/c-decl.c	(.../branches/gomp-4_0-branch)	(revision 205231)
> @@ -3646,8 +3646,9 @@ c_builtin_function_ext_scope (tree decl)
>    const char *name = IDENTIFIER_POINTER (id);
>    C_DECL_BUILTIN_PROTOTYPE (decl) = prototype_p (type);
>  
> -  bind (id, decl, external_scope, /*invisible=*/false, /*nested=*/false,
> -	UNKNOWN_LOCATION);
> +  if (external_scope)
> +    bind (id, decl, external_scope, /*invisible=*/false, /*nested=*/false,
> +	  UNKNOWN_LOCATION);
>  
>    /* Builtins in the implementation namespace are made visible without
>       needing to be explicitly declared.  See push_file_scope.  */
> --- gcc/cp/semantics.c	(.../trunk)	(revision 205223)
> +++ gcc/cp/semantics.c	(.../branches/gomp-4_0-branch)	(revision 205231)
> @@ -5214,6 +5214,8 @@ finish_omp_clauses (tree clauses)
>  	      t = mark_rvalue_use (t);
>  	      if (!processing_template_decl)
>  		{
> +		  if (TREE_CODE (OMP_CLAUSE_DECL (c)) == PARM_DECL)
> +		    t = maybe_constant_value (t);
>  		  t = fold_build_cleanup_point_expr (TREE_TYPE (t), t);
>  		  if (TREE_CODE (TREE_TYPE (OMP_CLAUSE_DECL (c)))
>  		      == POINTER_TYPE)
> --- gcc/testsuite/gcc.dg/gomp/simd-clones-1.c	(.../trunk)	(revision 0)
> +++ gcc/testsuite/gcc.dg/gomp/simd-clones-1.c	(.../branches/gomp-4_0-branch)	(revision 205231)
> @@ -0,0 +1,33 @@
> +/* { dg-do compile } */
> +/* { dg-options "-fopenmp -fdump-tree-optimized -O3" } */
> +
> +/* Test that functions that have SIMD clone counterparts are not
> +   cloned by IPA-cp.  For example, special_add() below has SIMD clones
> +   created for it.  However, if IPA-cp later decides to clone a
> +   specialization of special_add(x, 666) when analyzing fillit(), we
> +   will forever keep the vectorizer from using the SIMD versions of
> +   special_add in a loop.
> +
> +   If IPA-CP gets taught how to adjust the SIMD clones as well, this
> +   test could be removed.  */
> +
> +#pragma omp declare simd simdlen(4)
> +static int  __attribute__ ((noinline))
> +special_add (int x, int y)
> +{
> +  if (y == 666)
> +    return x + y + 123;
> +  else
> +    return x + y;
> +}
> +
> +void fillit(int *tot)
> +{
> +  int i;
> +
> +  for (i=0; i < 10000; ++i)
> +    tot[i] = special_add (i, 666);
> +}
> +
> +/* { dg-final { scan-tree-dump-not "special_add.constprop" "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> --- gcc/testsuite/gcc.dg/gomp/simd-clones-2.c	(.../trunk)	(revision 0)
> +++ gcc/testsuite/gcc.dg/gomp/simd-clones-2.c	(.../branches/gomp-4_0-branch)	(revision 205231)
> @@ -0,0 +1,26 @@
> +/* { dg-options "-fopenmp -fdump-tree-optimized -O" } */
> +
> +#pragma omp declare simd inbranch uniform(c) linear(b:66)
> +#pragma omp declare simd notinbranch aligned(c:32)
> +int addit(int a, int b, int *c)
> +{
> +  return a + b;
> +}
> +
> +#pragma omp declare simd uniform(a) aligned(a:32) linear(k:1) notinbranch
> +float setArray(float *a, float x, int k)
> +{
> +  a[k] = a[k] + x;
> +  return a[k];
> +}
> +
> +/* { dg-final { scan-tree-dump "_ZGVbN4ua32vl_setArray" "optimized" { target i?86-*-* x86_64-*-* } } } */
> +/* { dg-final { scan-tree-dump "_ZGVbN4vvva32_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */
> +/* { dg-final { scan-tree-dump "_ZGVbM4vl66u_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */
> +/* { dg-final { scan-tree-dump "_ZGVcN8ua32vl_setArray" "optimized" { target i?86-*-* x86_64-*-* } } } */
> +/* { dg-final { scan-tree-dump "_ZGVcN4vvva32_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */
> +/* { dg-final { scan-tree-dump "_ZGVcM4vl66u_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */
> +/* { dg-final { scan-tree-dump "_ZGVdN8ua32vl_setArray" "optimized" { target i?86-*-* x86_64-*-* } } } */
> +/* { dg-final { scan-tree-dump "_ZGVdN8vvva32_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */
> +/* { dg-final { scan-tree-dump "_ZGVdM8vl66u_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> --- gcc/testsuite/gcc.dg/gomp/simd-clones-3.c	(.../trunk)	(revision 0)
> +++ gcc/testsuite/gcc.dg/gomp/simd-clones-3.c	(.../branches/gomp-4_0-branch)	(revision 205231)
> @@ -0,0 +1,18 @@
> +/* { dg-options "-fopenmp -fdump-tree-optimized -O2" } */
> +
> +/* Test that if there is no *inbranch clauses, that both the masked and
> +   the unmasked version are created.  */
> +
> +#pragma omp declare simd
> +int addit(int a, int b, int c)
> +{
> +  return a + b;
> +}
> +
> +/* { dg-final { scan-tree-dump "_ZGVbN4vvv_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */
> +/* { dg-final { scan-tree-dump "_ZGVbM4vvv_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */
> +/* { dg-final { scan-tree-dump "_ZGVcN4vvv_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */
> +/* { dg-final { scan-tree-dump "_ZGVcM4vvv_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */
> +/* { dg-final { scan-tree-dump "_ZGVdN8vvv_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */
> +/* { dg-final { scan-tree-dump "_ZGVdM8vvv_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
> --- gcc/testsuite/gcc.dg/gomp/simd-clones-4.c	(.../trunk)	(revision 0)
> +++ gcc/testsuite/gcc.dg/gomp/simd-clones-4.c	(.../branches/gomp-4_0-branch)	(revision 205231)
> @@ -0,0 +1,11 @@
> +/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
> +/* { dg-options "-fopenmp" } */
> +
> +#pragma omp declare simd simdlen(4) notinbranch
> +int f2 (int a, int b)
> +{
> +  if (a > 5)
> +    return a + b;
> +  else
> +    return a - b;
> +}
> --- gcc/testsuite/gcc.dg/gomp/simd-clones-5.c	(.../trunk)	(revision 0)
> +++ gcc/testsuite/gcc.dg/gomp/simd-clones-5.c	(.../branches/gomp-4_0-branch)	(revision 205231)
> @@ -0,0 +1,12 @@
> +/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
> +/* { dg-options "-fopenmp -w" } */
> +
> +/* ?? The -w above is to inhibit the following warning for now:
> +   a.c:2:6: warning: AVX vector argument without AVX enabled changes
> +   the ABI [enabled by default].  */
> +
> +#pragma omp declare simd notinbranch simdlen(4)
> +void foo (int *a)
> +{
> +  *a = 555;
> +}
> --- gcc/testsuite/gcc.dg/gomp/simd-clones-6.c	(.../trunk)	(revision 0)
> +++ gcc/testsuite/gcc.dg/gomp/simd-clones-6.c	(.../branches/gomp-4_0-branch)	(revision 205231)
> @@ -0,0 +1,11 @@
> +/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
> +/* { dg-options "-fopenmp" } */
> +
> +/* Test that array subscripts are properly adjusted.  */
> +
> +int array[1000];
> +#pragma omp declare simd notinbranch simdlen(4)
> +void foo (int i)
> +{
> +  array[i] = 555;
> +}
> --- gcc/testsuite/gcc.dg/gomp/simd-clones-7.c	(.../trunk)	(revision 0)
> +++ gcc/testsuite/gcc.dg/gomp/simd-clones-7.c	(.../branches/gomp-4_0-branch)	(revision 205231)
> @@ -0,0 +1,16 @@
> +/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
> +/* { dg-options "-fopenmp -w" } */
> +
> +int array[1000];
> +
> +#pragma omp declare simd notinbranch simdlen(4)
> +void foo (int *a, int b)
> +{
> +  a[b] = 555;
> +}
> +
> +#pragma omp declare simd notinbranch simdlen(4)
> +void bar (int *a)
> +{
> +  *a = 555;
> +}
> --- gcc/testsuite/gcc.dg/vect/vect-simd-clone-10a.c	(.../trunk)	(revision 0)
> +++ gcc/testsuite/gcc.dg/vect/vect-simd-clone-10a.c	(.../branches/gomp-4_0-branch)	(revision 205231)
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +
> +#include "vect-simd-clone-10.h"
> +
> +#pragma omp declare simd notinbranch
> +extern int
> +foo (long int a, int b, int c)
> +{
> +  return a + b + c;
> +}
> +
> +#pragma omp declare simd notinbranch
> +extern long int
> +bar (int a, int b, long int c)
> +{
> +  return a + b + c;
> +}
> --- gcc/testsuite/gcc.dg/vect/vect-simd-clone-10.c	(.../trunk)	(revision 0)
> +++ gcc/testsuite/gcc.dg/vect/vect-simd-clone-10.c	(.../branches/gomp-4_0-branch)	(revision 205231)
> @@ -0,0 +1,83 @@
> +/* { dg-additional-options "-fopenmp-simd" } */
> +/* { dg-additional-options "-mavx" { target avx_runtime } } */
> +/* { dg-additional-sources vect-simd-clone-10a.c } */
> +
> +#include "tree-vect.h"
> +
> +#ifndef N
> +#define N 1024
> +#endif
> +
> +int a[N], b[N];
> +long int c[N];
> +unsigned char d[N];
> +
> +#include "vect-simd-clone-10.h"
> +
> +__attribute__((noinline)) void
> +fn1 (void)
> +{
> +  int i;
> +  #pragma omp simd
> +  for (i = 0; i < N; i++)
> +    a[i] = foo (c[i], a[i], b[i]) + 6;
> +  #pragma omp simd
> +  for (i = 0; i < N; i++)
> +    c[i] = bar (a[i], b[i], c[i]) * 2;
> +}
> +
> +__attribute__((noinline)) void
> +fn2 (void)
> +{
> +  int i;
> +  #pragma omp simd
> +  for (i = 0; i < N; i++)
> +    {
> +      a[i] = foo (c[i], a[i], b[i]) + 6;
> +      d[i]++;
> +    }
> +  #pragma omp simd
> +  for (i = 0; i < N; i++)
> +    {
> +      c[i] = bar (a[i], b[i], c[i]) * 2;
> +      d[i] /= 2;
> +    }
> +}
> +
> +__attribute__((noinline)) void
> +fn3 (void)
> +{
> +  int i;
> +  for (i = 0; i < N; i++)
> +    {
> +      a[i] = i * 2;
> +      b[i] = 17 + (i % 37);
> +      c[i] = (i & 63);
> +      d[i] = 16 + i;
> +    }
> +}
> +
> +int
> +main ()
> +{
> +  int i;
> +  check_vect ();
> +  fn3 ();
> +  fn1 ();
> +  for (i = 0; i < N; i++)
> +    if (a[i] != i * 2 + 23 + (i % 37) + (i & 63)
> +	|| b[i] != 17 + (i % 37)
> +	|| c[i] != i * 4 + 80 + 4 * (i % 37) + 4 * (i & 63))
> +      abort ();
> +  fn3 ();
> +  fn2 ();
> +  for (i = 0; i < N; i++)
> +    if (a[i] != i * 2 + 23 + (i % 37) + (i & 63)
> +	|| b[i] != 17 + (i % 37)
> +	|| c[i] != i * 4 + 80 + 4 * (i % 37) + 4 * (i & 63)
> +	|| d[i] != ((unsigned char) (17 + i)) / 2)
> +      abort ();
> +  return 0;
> +}
> +
> +/* { dg-final { cleanup-tree-dump "vect" } } */
> --- gcc/testsuite/gcc.dg/vect/vect-simd-clone-10.h	(.../trunk)	(revision 0)
> +++ gcc/testsuite/gcc.dg/vect/vect-simd-clone-10.h	(.../branches/gomp-4_0-branch)	(revision 205231)
> @@ -0,0 +1,4 @@
> +#pragma omp declare simd notinbranch
> +extern int foo (long int a, int b, int c);
> +#pragma omp declare simd notinbranch
> +extern long int bar (int a, int b, long int c);
> --- gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c	(.../trunk)	(revision 0)
> +++ gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c	(.../branches/gomp-4_0-branch)	(revision 205231)
> @@ -0,0 +1,58 @@
> +/* { dg-additional-options "-fopenmp-simd" } */
> +/* { dg-additional-options "-mavx" { target avx_runtime } } */
> +
> +#include "tree-vect.h"
> +
> +#ifndef N
> +#define N 1024
> +#endif
> +
> +int array[N];
> +
> +#pragma omp declare simd simdlen(4) notinbranch
> +#pragma omp declare simd simdlen(4) notinbranch uniform(b) linear(c:3)
> +#pragma omp declare simd simdlen(8) notinbranch
> +#pragma omp declare simd simdlen(8) notinbranch uniform(b) linear(c:3)
> +__attribute__((noinline)) int
> +foo (int a, int b, int c)
> +{
> +  if (a < 30)
> +    return 5;
> +  return a + b + c;
> +}
> +
> +__attribute__((noinline, noclone)) void
> +bar ()
> +{
> +  int i;
> +#pragma omp simd
> +  for (i = 0; i < N; ++i)
> +    array[i] = foo (i, 123, i * 3);
> +}
> +
> +__attribute__((noinline, noclone)) void
> +baz ()
> +{
> +  int i;
> +#pragma omp simd
> +  for (i = 0; i < N; ++i)
> +    array[i] = foo (i, array[i], i * 3);
> +}
> +
> +int
> +main ()
> +{
> +  int i;
> +  check_vect ();
> +  bar ();
> +  for (i = 0; i < N; i++)
> +    if (array[i] != (i < 30 ? 5 : i * 4 + 123))
> +      abort ();
> +  baz ();
> +  for (i = 0; i < N; i++)
> +    if (array[i] != (i < 30 ? 5 : i * 8 + 123))
> +      abort ();
> +  return 0;
> +}
> +
> +/* { dg-final { cleanup-tree-dump "vect" } } */
> --- gcc/testsuite/gcc.dg/vect/vect-simd-clone-2.c	(.../trunk)	(revision 0)
> +++ gcc/testsuite/gcc.dg/vect/vect-simd-clone-2.c	(.../branches/gomp-4_0-branch)	(revision 205231)
> @@ -0,0 +1,52 @@
> +/* { dg-additional-options "-fopenmp-simd" } */
> +/* { dg-additional-options "-mavx" { target avx_runtime } } */
> +
> +#include "tree-vect.h"
> +
> +#ifndef N
> +#define N 1024
> +#endif
> +
> +int array[N] __attribute__((aligned (32)));
> +
> +#pragma omp declare simd simdlen(4) notinbranch aligned(a:16) uniform(a) linear(b)
> +#pragma omp declare simd simdlen(4) notinbranch aligned(a:32) uniform(a) linear(b)
> +#pragma omp declare simd simdlen(8) notinbranch aligned(a:16) uniform(a) linear(b)
> +#pragma omp declare simd simdlen(8) notinbranch aligned(a:32) uniform(a) linear(b)
> +__attribute__((noinline)) void
> +foo (int *a, int b, int c)
> +{
> +  a[b] = c;
> +}
> +
> +__attribute__((noinline, noclone)) void
> +bar ()
> +{
> +  int i;
> +#pragma omp simd
> +  for (i = 0; i < N; ++i)
> +    foo (array, i, i * array[i]);
> +}
> +
> +__attribute__((noinline, noclone)) void
> +baz ()
> +{
> +  int i;
> +  for (i = 0; i < N; i++)
> +    array[i] = 5 * (i & 7);
> +}
> +
> +int
> +main ()
> +{
> +  int i;
> +  check_vect ();
> +  baz ();
> +  bar ();
> +  for (i = 0; i < N; i++)
> +    if (array[i] != 5 * (i & 7) * i)
> +      abort ();
> +  return 0;
> +}
> +
> +/* { dg-final { cleanup-tree-dump "vect" } } */
> --- gcc/testsuite/gcc.dg/vect/vect-simd-clone-3.c	(.../trunk)	(revision 0)
> +++ gcc/testsuite/gcc.dg/vect/vect-simd-clone-3.c	(.../branches/gomp-4_0-branch)	(revision 205231)
> @@ -0,0 +1,45 @@
> +/* { dg-additional-options "-fopenmp-simd" } */
> +/* { dg-additional-options "-mavx" { target avx_runtime } } */
> +
> +#include "tree-vect.h"
> +
> +#ifndef N
> +#define N 1024
> +#endif
> +
> +int d[N], e[N];
> +
> +#pragma omp declare simd simdlen(4) notinbranch uniform(b) linear(c:3)
> +__attribute__((noinline)) int
> +foo (int a, int b, int c)
> +{
> +  if (a < 30)
> +    return 5;
> +  return a + b + c;
> +}
> +
> +__attribute__((noinline, noclone)) void
> +bar ()
> +{
> +  int i;
> +#pragma omp simd
> +  for (i = 0; i < N; ++i)
> +    {
> +      d[i] = foo (i, 123, i * 3);
> +      e[i] = e[i] + i;
> +    }
> +}
> +
> +int
> +main ()
> +{
> +  int i;
> +  check_vect ();
> +  bar ();
> +  for (i = 0; i < N; i++)
> +    if (d[i] != (i < 30 ? 5 : i * 4 + 123) || e[i] != i)
> +      abort ();
> +  return 0;
> +}
> +
> +/* { dg-final { cleanup-tree-dump "vect" } } */
> --- gcc/testsuite/gcc.dg/vect/vect-simd-clone-4.c	(.../trunk)	(revision 0)
> +++ gcc/testsuite/gcc.dg/vect/vect-simd-clone-4.c	(.../branches/gomp-4_0-branch)	(revision 205231)
> @@ -0,0 +1,48 @@
> +/* { dg-additional-options "-fopenmp-simd" } */
> +/* { dg-additional-options "-mavx" { target avx_runtime } } */
> +
> +#include "tree-vect.h"
> +
> +#ifndef N
> +#define N 1024
> +#endif
> +
> +float d[N];
> +int e[N];
> +unsigned short f[N];
> +
> +#pragma omp declare simd simdlen(8) notinbranch uniform(b)
> +__attribute__((noinline)) float
> +foo (float a, float b, float c)
> +{
> +  if (a < 30)
> +    return 5.0f;
> +  return a + b + c;
> +}
> +
> +__attribute__((noinline, noclone)) void
> +bar ()
> +{
> +  int i;
> +#pragma omp simd
> +  for (i = 0; i < N; ++i)
> +    {
> +      d[i] = foo (i, 123, i * 3);
> +      e[i] = e[i] * 3;
> +      f[i] = f[i] + 1;
> +    }
> +}
> +
> +int
> +main ()
> +{
> +  int i;
> +  check_vect ();
> +  bar ();
> +  for (i = 0; i < N; i++)
> +    if (d[i] != (i < 30 ? 5.0f : i * 4 + 123.0f) || e[i] || f[i] != 1)
> +      abort ();
> +  return 0;
> +}
> +
> +/* { dg-final { cleanup-tree-dump "vect" } } */
> --- gcc/testsuite/gcc.dg/vect/vect-simd-clone-5.c	(.../trunk)	(revision 0)
> +++ gcc/testsuite/gcc.dg/vect/vect-simd-clone-5.c	(.../branches/gomp-4_0-branch)	(revision 205231)
> @@ -0,0 +1,43 @@
> +/* { dg-additional-options "-fopenmp-simd" } */
> +/* { dg-additional-options "-mavx" { target avx_runtime } } */
> +
> +#include "tree-vect.h"
> +
> +#ifndef N
> +#define N 1024
> +#endif
> +
> +int d[N], e[N];
> +
> +#pragma omp declare simd simdlen(4) notinbranch uniform(b) linear(c:3)
> +__attribute__((noinline)) long long int
> +foo (int a, int b, int c)
> +{
> +  return a + b + c;
> +}
> +
> +__attribute__((noinline, noclone)) void
> +bar ()
> +{
> +  int i;
> +#pragma omp simd
> +  for (i = 0; i < N; ++i)
> +    {
> +      d[i] = foo (i, 123, i * 3);
> +      e[i] = e[i] + i;
> +    }
> +}
> +
> +int
> +main ()
> +{
> +  int i;
> +  check_vect ();
> +  bar ();
> +  for (i = 0; i < N; i++)
> +    if (d[i] != i * 4 + 123 || e[i] != i)
> +      abort ();
> +  return 0;
> +}
> +
> +/* { dg-final { cleanup-tree-dump "vect" } } */
> --- gcc/testsuite/gcc.dg/vect/vect-simd-clone-6.c	(.../trunk)	(revision 0)
> +++ gcc/testsuite/gcc.dg/vect/vect-simd-clone-6.c	(.../branches/gomp-4_0-branch)	(revision 205231)
> @@ -0,0 +1,74 @@
> +/* { dg-additional-options "-fopenmp-simd" } */
> +/* { dg-additional-options "-mavx" { target avx_runtime } } */
> +
> +#include "tree-vect.h"
> +
> +#ifndef N
> +#define N 1024
> +#endif
> +
> +int a[N];
> +long long int b[N];
> +short c[N];
> +
> +#pragma omp declare simd
> +#pragma omp declare simd uniform(b) linear(c:3)
> +__attribute__((noinline)) short
> +foo (int a, long long int b, short c)
> +{
> +  return a + b + c;
> +}
> +
> +__attribute__((noinline, noclone)) void
> +bar (int x)
> +{
> +  int i;
> +  if (x == 0)
> +    {
> +    #pragma omp simd
> +      for (i = 0; i < N; i++)
> +	c[i] = foo (a[i], b[i], c[i]);
> +    }
> +  else
> +    {
> +    #pragma omp simd
> +      for (i = 0; i < N; i++)
> +	c[i] = foo (a[i], x, i * 3);
> +    }
> +}
> +
> +__attribute__((noinline, noclone)) void
> +baz (void)
> +{
> +  int i;
> +  for (i = 0; i < N; i++)
> +    {
> +      a[i] = 2 * i;
> +      b[i] = -7 * i + 6;
> +      c[i] = (i & 31) << 4;
> +    }
> +}
> +
> +int
> +main ()
> +{
> +  int i;
> +  check_vect ();
> +  baz ();
> +  bar (0);
> +  for (i = 0; i < N; i++)
> +    if (a[i] != 2 * i || b[i] != 6 - 7 * i
> +	|| c[i] != 6 - 5 * i + ((i & 31) << 4))
> +      abort ();
> +    else
> +      a[i] = c[i];
> +  bar (17);
> +  for (i = 0; i < N; i++)
> +    if (a[i] != 6 - 5 * i + ((i & 31) << 4)
> +	|| b[i] != 6 - 7 * i
> +	|| c[i] != 23 - 2 * i + ((i & 31) << 4))
> +      abort ();
> +  return 0;
> +}
> +
> +/* { dg-final { cleanup-tree-dump "vect" } } */
> --- gcc/testsuite/gcc.dg/vect/vect-simd-clone-7.c	(.../trunk)	(revision 0)
> +++ gcc/testsuite/gcc.dg/vect/vect-simd-clone-7.c	(.../branches/gomp-4_0-branch)	(revision 205231)
> @@ -0,0 +1,74 @@
> +/* { dg-additional-options "-fopenmp-simd" } */
> +/* { dg-additional-options "-mavx" { target avx_runtime } } */
> +
> +#include "tree-vect.h"
> +
> +#ifndef N
> +#define N 1024
> +#endif
> +
> +int a[N];
> +long long int b[N];
> +short c[N];
> +
> +#pragma omp declare simd
> +#pragma omp declare simd uniform(b) linear(c:3)
> +__attribute__((noinline)) short
> +foo (int a, long long int b, int c)
> +{
> +  return a + b + c;
> +}
> +
> +__attribute__((noinline, noclone)) void
> +bar (int x)
> +{
> +  int i;
> +  if (x == 0)
> +    {
> +    #pragma omp simd
> +      for (i = 0; i < N; i++)
> +	c[i] = foo (a[i], b[i], c[i]);
> +    }
> +  else
> +    {
> +    #pragma omp simd
> +      for (i = 0; i < N; i++)
> +	c[i] = foo (a[i], x, i * 3);
> +    }
> +}
> +
> +__attribute__((noinline, noclone)) void
> +baz (void)
> +{
> +  int i;
> +  for (i = 0; i < N; i++)
> +    {
> +      a[i] = 2 * i;
> +      b[i] = -7 * i + 6;
> +      c[i] = (i & 31) << 4;
> +    }
> +}
> +
> +int
> +main ()
> +{
> +  int i;
> +  check_vect ();
> +  baz ();
> +  bar (0);
> +  for (i = 0; i < N; i++)
> +    if (a[i] != 2 * i || b[i] != 6 - 7 * i
> +	|| c[i] != 6 - 5 * i + ((i & 31) << 4))
> +      abort ();
> +    else
> +      a[i] = c[i];
> +  bar (17);
> +  for (i = 0; i < N; i++)
> +    if (a[i] != 6 - 5 * i + ((i & 31) << 4)
> +	|| b[i] != 6 - 7 * i
> +	|| c[i] != 23 - 2 * i + ((i & 31) << 4))
> +      abort ();
> +  return 0;
> +}
> +
> +/* { dg-final { cleanup-tree-dump "vect" } } */
> --- gcc/testsuite/gcc.dg/vect/vect-simd-clone-8.c	(.../trunk)	(revision 0)
> +++ gcc/testsuite/gcc.dg/vect/vect-simd-clone-8.c	(.../branches/gomp-4_0-branch)	(revision 205231)
> @@ -0,0 +1,94 @@
> +/* { dg-additional-options "-fopenmp-simd" } */
> +/* { dg-additional-options "-mavx" { target avx_runtime } } */
> +
> +#include "tree-vect.h"
> +
> +#ifndef N
> +#define N 1024
> +#endif
> +
> +int a[N], b[N];
> +long int c[N];
> +unsigned char d[N];
> +
> +#pragma omp declare simd simdlen(8) notinbranch
> +__attribute__((noinline)) int
> +foo (long int a, int b, int c)
> +{
> +  return a + b + c;
> +}
> +
> +#pragma omp declare simd simdlen(8) notinbranch
> +__attribute__((noinline)) long int
> +bar (int a, int b, long int c)
> +{
> +  return a + b + c;
> +}
> +
> +__attribute__((noinline)) void
> +fn1 (void)
> +{
> +  int i;
> +  #pragma omp simd
> +  for (i = 0; i < N; i++)
> +    a[i] = foo (c[i], a[i], b[i]) + 6;
> +  #pragma omp simd
> +  for (i = 0; i < N; i++)
> +    c[i] = bar (a[i], b[i], c[i]) * 2;
> +}
> +
> +__attribute__((noinline)) void
> +fn2 (void)
> +{
> +  int i;
> +  #pragma omp simd
> +  for (i = 0; i < N; i++)
> +    {
> +      a[i] = foo (c[i], a[i], b[i]) + 6;
> +      d[i]++;
> +    }
> +  #pragma omp simd
> +  for (i = 0; i < N; i++)
> +    {
> +      c[i] = bar (a[i], b[i], c[i]) * 2;
> +      d[i] /= 2;
> +    }
> +}
> +
> +__attribute__((noinline)) void
> +fn3 (void)
> +{
> +  int i;
> +  for (i = 0; i < N; i++)
> +    {
> +      a[i] = i * 2;
> +      b[i] = 17 + (i % 37);
> +      c[i] = (i & 63);
> +      d[i] = 16 + i;
> +    }
> +}
> +
> +int
> +main ()
> +{
> +  int i;
> +  check_vect ();
> +  fn3 ();
> +  fn1 ();
> +  for (i = 0; i < N; i++)
> +    if (a[i] != i * 2 + 23 + (i % 37) + (i & 63)
> +	|| b[i] != 17 + (i % 37)
> +	|| c[i] != i * 4 + 80 + 4 * (i % 37) + 4 * (i & 63))
> +      abort ();
> +  fn3 ();
> +  fn2 ();
> +  for (i = 0; i < N; i++)
> +    if (a[i] != i * 2 + 23 + (i % 37) + (i & 63)
> +	|| b[i] != 17 + (i % 37)
> +	|| c[i] != i * 4 + 80 + 4 * (i % 37) + 4 * (i & 63)
> +	|| d[i] != ((unsigned char) (17 + i)) / 2)
> +      abort ();
> +  return 0;
> +}
> +
> +/* { dg-final { cleanup-tree-dump "vect" } } */
> --- gcc/testsuite/gcc.dg/vect/vect-simd-clone-9.c	(.../trunk)	(revision 0)
> +++ gcc/testsuite/gcc.dg/vect/vect-simd-clone-9.c	(.../branches/gomp-4_0-branch)	(revision 205231)
> @@ -0,0 +1,94 @@
> +/* { dg-additional-options "-fopenmp-simd" } */
> +/* { dg-additional-options "-mavx" { target avx_runtime } } */
> +
> +#include "tree-vect.h"
> +
> +#ifndef N
> +#define N 1024
> +#endif
> +
> +int a[N], b[N];
> +long int c[N];
> +unsigned char d[N];
> +
> +#pragma omp declare simd notinbranch
> +__attribute__((noinline)) static int
> +foo (long int a, int b, int c)
> +{
> +  return a + b + c;
> +}
> +
> +#pragma omp declare simd notinbranch
> +__attribute__((noinline)) static long int
> +bar (int a, int b, long int c)
> +{
> +  return a + b + c;
> +}
> +
> +__attribute__((noinline)) void
> +fn1 (void)
> +{
> +  int i;
> +  #pragma omp simd
> +  for (i = 0; i < N; i++)
> +    a[i] = foo (c[i], a[i], b[i]) + 6;
> +  #pragma omp simd
> +  for (i = 0; i < N; i++)
> +    c[i] = bar (a[i], b[i], c[i]) * 2;
> +}
> +
> +__attribute__((noinline)) void
> +fn2 (void)
> +{
> +  int i;
> +  #pragma omp simd
> +  for (i = 0; i < N; i++)
> +    {
> +      a[i] = foo (c[i], a[i], b[i]) + 6;
> +      d[i]++;
> +    }
> +  #pragma omp simd
> +  for (i = 0; i < N; i++)
> +    {
> +      c[i] = bar (a[i], b[i], c[i]) * 2;
> +      d[i] /= 2;
> +    }
> +}
> +
> +__attribute__((noinline)) void
> +fn3 (void)
> +{
> +  int i;
> +  for (i = 0; i < N; i++)
> +    {
> +      a[i] = i * 2;
> +      b[i] = 17 + (i % 37);
> +      c[i] = (i & 63);
> +      d[i] = 16 + i;
> +    }
> +}
> +
> +int
> +main ()
> +{
> +  int i;
> +  check_vect ();
> +  fn3 ();
> +  fn1 ();
> +  for (i = 0; i < N; i++)
> +    if (a[i] != i * 2 + 23 + (i % 37) + (i & 63)
> +	|| b[i] != 17 + (i % 37)
> +	|| c[i] != i * 4 + 80 + 4 * (i % 37) + 4 * (i & 63))
> +      abort ();
> +  fn3 ();
> +  fn2 ();
> +  for (i = 0; i < N; i++)
> +    if (a[i] != i * 2 + 23 + (i % 37) + (i & 63)
> +	|| b[i] != 17 + (i % 37)
> +	|| c[i] != i * 4 + 80 + 4 * (i % 37) + 4 * (i & 63)
> +	|| d[i] != ((unsigned char) (17 + i)) / 2)
> +      abort ();
> +  return 0;
> +}
> +
> +/* { dg-final { cleanup-tree-dump "vect" } } */
> --- gcc/testsuite/g++.dg/gomp/declare-simd-1.C	(.../trunk)	(revision 205223)
> +++ gcc/testsuite/g++.dg/gomp/declare-simd-1.C	(.../branches/gomp-4_0-branch)	(revision 205231)
> @@ -239,5 +239,5 @@ struct D
>  void
>  f38 (D &d)
>  {
> -  d.f37 <12> (6);
> +  d.f37 <16> (6);
>  }
> 
> 	Jakub

Overall it looks good - it would be nice to split out and commit
separately the IPA cloning infrastructure re-org (and the expr.c hunk).

The LTO issue needs to be addressed - the simplest thing to me looks
to defer cloning to LTRANS stage.

Thanks,
Richard.
Jakub Jelinek Nov. 22, 2013, 11:19 a.m. UTC | #2
On Fri, Nov 22, 2013 at 11:08:41AM +0100, Richard Biener wrote:
> > @@ -284,6 +382,12 @@ public:
> >    /* Declaration node used to be clone of. */
> >    tree former_clone_of;
> >  
> > +  /* If this is a SIMD clone, this points to the SIMD specific
> > +     information for it.  */
> > +  struct cgraph_simd_clone *simdclone;
> > +  /* If this function has SIMD clones, this points to the first clone.  */
> > +  struct cgraph_node *simd_clones;
> > +
> 
> I wonder how you run all of this through LTO (I'll see below I guess ;))

It doesn't work, as in, all the added testcases work just fine without -flto
and all of them ICE with -flto, but there are multiple known issues with LTO
before that (internal fns, etc.).  More below.

> The expr.c hunk is also ok independently of the patch.

Ok, thanks (though without the rest of the patch probably nothing emits it).

> > @@ -3758,6 +3772,124 @@ ipa_modify_call_arguments (struct cgraph
> >    free_dominance_info (CDI_DOMINATORS);
> >  }
> 
> You've run the above through Martin IIRC, but ...

Aldy did.

> > +/* If the expression *EXPR should be replaced by a reduction of a parameter, do
> > +   so.  ADJUSTMENTS is a pointer to a vector of adjustments.  CONVERT
> > +   specifies whether the function should care about type incompatibility the
> > +   current and new expressions.  If it is false, the function will leave
> > +   incompatibility issues to the caller.  Return true iff the expression
> > +   was modified. */
> > +
> > +bool
> > +ipa_modify_expr (tree *expr, bool convert,
> > +		 ipa_parm_adjustment_vec adjustments)
> > +{
> > +  struct ipa_parm_adjustment *cand
> > +    = ipa_get_adjustment_candidate (&expr, &convert, adjustments, false);
> > +  if (!cand)
> > +    return false;
> > +
> > +  tree src;
> > +  if (cand->by_ref)
> > +    src = build_simple_mem_ref (cand->new_decl);
> 
> is this function mostly copied from elsewhere?  Because
> using build_simple_mem_ref always smells like possible TBAA problems.

Perhaps, but this is just code reorg, the same

-  if (cand->by_ref)
-    src = build_simple_mem_ref (cand->reduction);
-  else
-    src = cand->reduction;

used to sit in sra_ipa_modify_expr before.

> 
> > +  else
> > +    src = cand->new_decl;
> > +
> > +  if (dump_file && (dump_flags & TDF_DETAILS))
> > +    {
> > +      fprintf (dump_file, "About to replace expr ");
> > +      print_generic_expr (dump_file, *expr, 0);
> > +      fprintf (dump_file, " with ");
> > +      print_generic_expr (dump_file, src, 0);
> > +      fprintf (dump_file, "\n");
> > +    }
> > +
> > +  if (convert && !useless_type_conversion_p (TREE_TYPE (*expr), cand->type))
> > +    {
> > +      tree vce = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (*expr), src);
> > +      *expr = vce;
> 
> Why build1 and not fold it?  I assume from above you either have a plain 
> decl (cand->new_decl) or a MEM_REF.  For both cases simply folding
> the VCE into a MEM_REF works.

Again, preexisting code from sra_ipa_modify_expr.  Can it be changed
incrementally/independently of this?

> > +    }
> > +  else
> > +    *expr = src;
> > +  return true;
> > +}
> > +
> > +/* If T is an SSA_NAME, return NULL if it is not a default def or
> > +   return its base variable if it is.  If IGNORE_DEFAULT_DEF is true,
> > +   the base variable is always returned, regardless if it is a default
> > +   def.  Return T if it is not an SSA_NAME.  */
> > +
> > +static tree
> > +get_ssa_base_param (tree t, bool ignore_default_def)
> > +{
> > +  if (TREE_CODE (t) == SSA_NAME)
> > +    {
> > +      if (ignore_default_def || SSA_NAME_IS_DEFAULT_DEF (t))
> > +	return SSA_NAME_VAR (t);
> > +      else
> > +	return NULL_TREE;
> > +    }
> > +  return t;
> > +}
> 
> This function will return non-NULL for non-PARMs - is that intended?

Again, seems to be preexisting code from tree-sra.c.  Aldy/Martin?

> > +  /* Ignore
> > +     #pragma omp declare simd
> > +     extern int foo ();
> > +     in C, there we don't know the argument types at all.  */
> > +  if (!node->definition
> > +      && TYPE_ARG_TYPES (TREE_TYPE (node->decl)) == NULL_TREE)
> > +    return;
> 
> I wonder if you want to diagnose this case (but where?  best during
> parsing if that is allowed).

It isn't invalid per the standard, though of course if you have
#pragma omp declare simd
int foo ();
you can't supply any clauses that refer to parameters (thus, all are assumed
to be vector arguments.  If the function is defined locally and supplies
arguments there, it will have DECL_ARGUMENTS and can be handled easily,
otherwise I just chose to punt, it is too hard for too little gain.
Perhaps could warn with -Wopenmp-simd about it.  I mean to guard also
the other warnings about inability to emit simd clones with -Wopenmp-simd.

> > +      if (count == 0)
> > +	continue;
> > +
> > +      for (int i = 0; i < count * 2; i++)
> 
> Here (and also elsewhere) the patch could do with a few extra
> comments what is happening.

Ok.

> > --- gcc/passes.def	(.../trunk)	(revision 205223)
> > +++ gcc/passes.def	(.../branches/gomp-4_0-branch)	(revision 205231)
> > @@ -97,6 +97,7 @@ along with GCC; see the file COPYING3.
> >        NEXT_PASS (pass_feedback_split_functions);
> >    POP_INSERT_PASSES ()
> >    NEXT_PASS (pass_ipa_increase_alignment);
> > +  NEXT_PASS (pass_omp_simd_clone);
> >    NEXT_PASS (pass_ipa_tm);
> >    NEXT_PASS (pass_ipa_lower_emutls);
> >    TERMINATE_PASS_LIST ()
> 
> So clones are created before streaming LTO.  You do have vect.exp
> testcases that are also run through -flto but does it actually
> "work" there?  I remember seeing changes to cgraph unreachable
> node removal based on some flag that isn't streamed, no?

Aldy has done the pass placement, I wonder also whether it wouldn't be
best to put the OpenMP cloning as the very last IPA pass where all the other
cloning etc. is already done.
Right now we want to punt on IPA-CP/IPA-SRA etc. cloning of
#pragma omp declare simd functions, because if the simd clones are created
first, then cloning the origins and adjusting calls to them would lead to
the simd clones not actually being used, and if simd clones are created
late, on the other side the code isn't able to adjust "omp declare simd"
attribute (hopefully it could be taught at least e.g. about removing
arguments, either because they are unused or because they can be assumed
to be constant, we perhaps could punt only if IPA cloning wants to replace
an argument with something else).

> > +		      tree fndecl = gimple_call_fndecl (stmt), op;
> > +		      if (fndecl != NULL_TREE)
> > +			{
> > +			  struct cgraph_node *node = cgraph_get_node (fndecl);
> > +			  if (node != NULL && node->simd_clones != NULL)
> 
> So you use node->simd_clones which also need LTO streaming.
> 
> What's the reason you cannot defer SIMD cloning to LTRANS stage
> as simple IPA pass next to IPA-PTA?

Yeah, see above.
> 
> > +			    {
> > +			      unsigned int j, n = gimple_call_num_args (stmt);
> > +			      for (j = 0; j < n; j++)
> > +				{
> > +				  op = gimple_call_arg (stmt, j);
> > +				  if (DECL_P (op)
> > +				      || (REFERENCE_CLASS_P (op)
> > +					  && get_base_address (op)))
> > +				    break;
> > +				}
> > +			      op = gimple_call_lhs (stmt);
> > +			      /* Ignore #pragma omp declare simd functions
> > +				 if they don't have data references in the
> > +				 call stmt itself.  */
> > +			      if (j == n
> > +				  && !(op
> > +				       && (DECL_P (op)
> > +					   || (REFERENCE_CLASS_P (op)
> > +					       && get_base_address (op)))))
> > +				continue;
> 
> Hmm.  I guess I have an idea now how to "better" support calls in
> data-ref/dependence analysis.  The above is fine for now - you
> might want to dump sth here if you fail because datarefs in a declare
> simd fn call.

Okay.

> > +	      if (is_gimple_call (stmt))
> > +		{
> > +		  /* Ignore calls with no lhs.  These must be calls to
> > +		     #pragma omp simd functions, and what vectorization factor
> > +		     it really needs can't be determined until
> > +		     vectorizable_simd_clone_call.  */
> 
> Ick - that's bad.  Well, or rather it doesn't participate in
> vectorization factor determining then, resulting in missed
> vectorizations eventually.  You basically say "any vect factor is ok"
> here?

Right.  The thing is, if there is no lhs, I really don't know how it will
participate in the vectorization factor decision, and won't know it until
the vectorizable_simd_clone_call call, because whether a particular
clone is usable depends on which of the arguments are uniform, linear (with
what linear step) and tons of other things.
Perhaps if there is just one simd clone or all simd clones have some
non-empty set of arguments all without uniform/linear clauses, then we could
pick the smallest of those surely vector args as the one for determining
vectorization factor.  If those arguments have internal def, then the type
will be used already somewhere else in the loop to determine vf, so it is
only about parameters that are passed constant/external def values, but are
required to be in vector parameters.  But I believe
vectorizable_simd_clone_call can handle those just fine, say if you have
all types in the loop long and thus vf decisions are only for long,
so for AVX2 say vf = 4, then if you have
#pragma omp declare simd uniform (a) aligned (a : 32) linear (b)
void foo (long *a, long b, int c);
and pass constant 23 to it, then if there is a simdlen(4) clone (will be
on i?86/x86_64), then the last argument is passed in V4SImode parameter
and the code should handle it fine.  Similarly if all types are int
and there is a vector long argument passed a constant (or external def),
it will be passed in two parameters, each one containing half, and the
function should handle that too.
> 
> > +		  if (STMT_VINFO_VECTYPE (stmt_info) == NULL_TREE)
> > +		    {
> > +		      unsigned int j, n = gimple_call_num_args (stmt);
> > +		      for (j = 0; j < n; j++)
> > +			{
> > +			  scalar_type = TREE_TYPE (gimple_call_arg (stmt, j));
> > +			  vectype = get_vectype_for_scalar_type (scalar_type);
> > +			  if (vectype)
> > +			    {
> > +			      STMT_VINFO_VECTYPE (stmt_info) = vectype;
> > +			      break;
> > +			    }
> > +			}
> > +		    }
> > +		  if (STMT_VINFO_VECTYPE (stmt_info) != NULL_TREE)
> > +		    {
> > +		      if (!analyze_pattern_stmt && gsi_end_p (pattern_def_si))
> > +			{
> > +			  pattern_def_seq = NULL;
> > +			  gsi_next (&si);
> > +			}
> > +		      continue;
> > +		    }
> 
> Both cases above need comments - why do you chose the first param
> for determining STMT_VINFO_VECTYPE?  Isn't STMT_VINFO_VECTYPE
> completely irrelevant for calls w/o LHS?  Answer: yes it is!

It is completely irrelevant, yes.

> I'd have expected an unconditional continue here (and leave
> STMT_VINFO_VECTYPE == NULL - fact is that the vector type of
> the argument is determined by its definition and thus may
> be different from what you record here anyway).

Unfortunately it doesn't work (tried that).  The way all the
vectorizable_* functions are called in sequence, most of them
actually look at STMT_VINFO_VECTYPE before bailing out because
they are for stmts that aren't simd clone calls and thus ICE/segfault.
It was much easier to pass some non-NULL value than to change all of them.

> > +  if (stmt_can_throw_internal (stmt))
> > +    return false;
> 
> Can't happen (loop form checks).

But vectorizable_call has the same call.  So shall both be removed?

> > +  vectype = STMT_VINFO_VECTYPE (stmt_info);
> 
> See above - questionable if this doesn't result from looking at
> the LHS.

This particular function just loads it into a variable and uses
only if it has lhs.

> > +      if (thisarginfo.vectype != NULL_TREE
> > +	  && loop_vinfo
> > +	  && TREE_CODE (op) == SSA_NAME
> > +	  && simple_iv (loop, loop_containing_stmt (stmt), op, &iv, false)
> > +	  && tree_fits_shwi_p (iv.step))
> > +	{
> > +	  thisarginfo.linear_step = tree_to_shwi (iv.step);
> 
> Hmm, you should check thisarginfo.dt instead (I assume this case
> is for induction/reduction defs)?  In this case you also should
> use STMT_VINFO_LOOP_PHI_EVOLUTION_PART and not re-analyze via simple_iv.

I can try that.
> 
> > +	  thisarginfo.op = iv.base;
> > +	}
> > +      else if (thisarginfo.vectype == NULL_TREE
> > +	       && POINTER_TYPE_P (TREE_TYPE (op)))
> > +	thisarginfo.align = get_pointer_alignment (op) / BITS_PER_UNIT;
> 
> So this is for dt_external defs?

I guess even both vect_constant_def and vect_external_def, simply something
that is uniform.

> Please switch on thisarginfo.dt here - that more naturally explains
> what you are doing (otherwise this definitely misses a comment).

> > +      this_badness += target_badness * 512;
> > +      /* FORNOW: Have to add code to add the mask argument.  */
> > +      if (n->simdclone->inbranch)
> > +	continue;
> 
> We don't support if-converting calls anyway, no?

Not yet.  Supporting them I guess depends on the
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01268.html
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01437.html
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01550.html
series.  With that infrastructure, I think we could e.g. represent
the conditional calls as MASK_CALL internal call that would have
a mask argument (like MASK_LOAD/STORE), then ADDR_EXPR of the
function decl that has simd clones, plus the original arguments,
or something similar, then we'd just extract the function decl
from it in this function and just vectorize the mask argument
too and pass it through as the last argument (or set of arguments)
to the inbranch simd clone.

> > +      for (i = 0; i < nargs; i++)
> > +	{
> > +	  switch (n->simdclone->args[i].arg_type)
> > +	    {
> > +	    case SIMD_CLONE_ARG_TYPE_VECTOR:
> > +	      if (!useless_type_conversion_p
> > +		     (n->simdclone->args[i].orig_type,
> > +		      TREE_TYPE (gimple_call_arg (stmt, i))))
> > +		i = -1;
> 
> But you don't verify the vectype against the clone vectype?

The code can handle vector narrowing or widening, splitting
into multiple arguments etc.  If the clone exist, we know the
corresponding vector type exists, so does the arginfo[i].vectype
that the vectorizer gives us the argument in.
The above only handles the case where arguments are promoted
from the types in TYPE_ARG_TYPES of the call/DECL_ARGUMENTS
to something wider in the GIMPLE_CALL (happens for short/char
arguments apparently).  The above code just punts on it, I don't
want to have in that function yet another full copy of narrowing/widening
conversions.  The plan was (so far unimplemented) to handle this
in tree-vect-patterns.c, if we have say char argument and pass an
int to it, if the argument is constant, we'd just fold_convert it
to the right type, if there is widening right before it, we'd use
the unwidened SSA_NAME instead, otherwise narrow.  Then vf
determination etc. would handle it right.  Does that look reasonable to you?

> > +	      else if (arginfo[i].vectype == NULL_TREE
> 
> I'd like to see checks based on the def type, not vectype.

Ok.
> 
> > +		       || arginfo[i].linear_step)
> > +		this_badness += 64;
> > +	      break;
> > +	    case SIMD_CLONE_ARG_TYPE_UNIFORM:
> > +	      if (arginfo[i].vectype != NULL_TREE)
> 
> Likewise (and below, too).

> > +  if (!vec_stmt) /* transformation not required.  */
> > +    {
> > +      STMT_VINFO_TYPE (stmt_info) = call_simd_clone_vec_info_type;
> > +      if (dump_enabled_p ())
> > +	dump_printf_loc (MSG_NOTE, vect_location,
> > +			 "=== vectorizable_simd_clone_call ===\n");
> > +/*      vect_model_simple_cost (stmt_info, ncopies, dt, NULL, NULL); */
> > +      arginfo.release ();
> 
> Please save the result from the analysis (selecting the simd clone)
> in the stmt_vinfo and skip the analysis during transform phase.

Just stick there the selected cgraph_node?

As for the cost computation commented out above, it is hard to predict it
right, probably we should at least add the cost of the scalar call, so
the vectorizable function isn't considered cheaper.  But more than that?

> > +		      vec_oprnd0
> > +			= build3 (BIT_FIELD_REF, atype, vec_oprnd0,
> > +				  build_int_cst (integer_type_node, prec),
> > +				  build_int_cst (integer_type_node,
> > +						 (m & (k - 1)) * prec));
> 
> Some helpers to build the tree to select a sub-vector would be nice
> (I remember seeing this kind of pattern elsewhere).

Ok, I'll try something.

> > +		  new_stmt
> > +		    = gimple_build_assign_with_ops (TREE_CODE (t),
> > +						    make_ssa_name (vectype,
> > +								   NULL),
> > +						    t, NULL_TREE);
> 
> For SINGLE_RHS assigns I prefer gimple_build_assign.

Okay.

> > +
> > +  /* Update the exception handling table with the vector stmt if necessary.  */
> > +  if (maybe_clean_or_replace_eh_stmt (stmt, *vec_stmt))
> > +    gimple_purge_dead_eh_edges (gimple_bb (stmt));
> 
> But you've early-outed on throwing stmts?  Generally this shouldn't 
> happen.

This is again a copy from vectorizable_call.  So, do you think it can
be dropped there too?

> Overall it looks good - it would be nice to split out and commit
> separately the IPA cloning infrastructure re-org (and the expr.c hunk).
> 
> The LTO issue needs to be addressed - the simplest thing to me looks
> to defer cloning to LTRANS stage.

Yeah, but the start should be to handle the internal calls that are used
everywhere now by #pragma omp simd too, and ubsan etc.

	Jakub
Richard Biener Nov. 22, 2013, 12:25 p.m. UTC | #3
On Fri, 22 Nov 2013, Jakub Jelinek wrote:

> On Fri, Nov 22, 2013 at 11:08:41AM +0100, Richard Biener wrote:
> > > @@ -284,6 +382,12 @@ public:
> > >    /* Declaration node used to be clone of. */
> > >    tree former_clone_of;
> > >  
> > > +  /* If this is a SIMD clone, this points to the SIMD specific
> > > +     information for it.  */
> > > +  struct cgraph_simd_clone *simdclone;
> > > +  /* If this function has SIMD clones, this points to the first clone.  */
> > > +  struct cgraph_node *simd_clones;
> > > +
> > 
> > I wonder how you run all of this through LTO (I'll see below I guess ;))
> 
> It doesn't work, as in, all the added testcases work just fine without -flto
> and all of them ICE with -flto, but there are multiple known issues with LTO
> before that (internal fns, etc.).  More below.
> 
> > The expr.c hunk is also ok independently of the patch.
> 
> Ok, thanks (though without the rest of the patch probably nothing emits it).
> 
> > > @@ -3758,6 +3772,124 @@ ipa_modify_call_arguments (struct cgraph
> > >    free_dominance_info (CDI_DOMINATORS);
> > >  }
> > 
> > You've run the above through Martin IIRC, but ...
> 
> Aldy did.
> 
> > > +/* If the expression *EXPR should be replaced by a reduction of a parameter, do
> > > +   so.  ADJUSTMENTS is a pointer to a vector of adjustments.  CONVERT
> > > +   specifies whether the function should care about type incompatibility the
> > > +   current and new expressions.  If it is false, the function will leave
> > > +   incompatibility issues to the caller.  Return true iff the expression
> > > +   was modified. */
> > > +
> > > +bool
> > > +ipa_modify_expr (tree *expr, bool convert,
> > > +		 ipa_parm_adjustment_vec adjustments)
> > > +{
> > > +  struct ipa_parm_adjustment *cand
> > > +    = ipa_get_adjustment_candidate (&expr, &convert, adjustments, false);
> > > +  if (!cand)
> > > +    return false;
> > > +
> > > +  tree src;
> > > +  if (cand->by_ref)
> > > +    src = build_simple_mem_ref (cand->new_decl);
> > 
> > is this function mostly copied from elsewhere?  Because
> > using build_simple_mem_ref always smells like possible TBAA problems.
> 
> Perhaps, but this is just code reorg, the same
> 
> -  if (cand->by_ref)
> -    src = build_simple_mem_ref (cand->reduction);
> -  else
> -    src = cand->reduction;
> 
> used to sit in sra_ipa_modify_expr before.
> 
> > 
> > > +  else
> > > +    src = cand->new_decl;
> > > +
> > > +  if (dump_file && (dump_flags & TDF_DETAILS))
> > > +    {
> > > +      fprintf (dump_file, "About to replace expr ");
> > > +      print_generic_expr (dump_file, *expr, 0);
> > > +      fprintf (dump_file, " with ");
> > > +      print_generic_expr (dump_file, src, 0);
> > > +      fprintf (dump_file, "\n");
> > > +    }
> > > +
> > > +  if (convert && !useless_type_conversion_p (TREE_TYPE (*expr), cand->type))
> > > +    {
> > > +      tree vce = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (*expr), src);
> > > +      *expr = vce;
> > 
> > Why build1 and not fold it?  I assume from above you either have a plain 
> > decl (cand->new_decl) or a MEM_REF.  For both cases simply folding
> > the VCE into a MEM_REF works.
> 
> Again, preexisting code from sra_ipa_modify_expr.  Can it be changed
> incrementally/independently of this?
> 
> > > +    }
> > > +  else
> > > +    *expr = src;
> > > +  return true;
> > > +}
> > > +
> > > +/* If T is an SSA_NAME, return NULL if it is not a default def or
> > > +   return its base variable if it is.  If IGNORE_DEFAULT_DEF is true,
> > > +   the base variable is always returned, regardless if it is a default
> > > +   def.  Return T if it is not an SSA_NAME.  */
> > > +
> > > +static tree
> > > +get_ssa_base_param (tree t, bool ignore_default_def)
> > > +{
> > > +  if (TREE_CODE (t) == SSA_NAME)
> > > +    {
> > > +      if (ignore_default_def || SSA_NAME_IS_DEFAULT_DEF (t))
> > > +	return SSA_NAME_VAR (t);
> > > +      else
> > > +	return NULL_TREE;
> > > +    }
> > > +  return t;
> > > +}
> > 
> > This function will return non-NULL for non-PARMs - is that intended?
> 
> Again, seems to be preexisting code from tree-sra.c.  Aldy/Martin?
> 
> > > +  /* Ignore
> > > +     #pragma omp declare simd
> > > +     extern int foo ();
> > > +     in C, there we don't know the argument types at all.  */
> > > +  if (!node->definition
> > > +      && TYPE_ARG_TYPES (TREE_TYPE (node->decl)) == NULL_TREE)
> > > +    return;
> > 
> > I wonder if you want to diagnose this case (but where?  best during
> > parsing if that is allowed).
> 
> It isn't invalid per the standard, though of course if you have
> #pragma omp declare simd
> int foo ();
> you can't supply any clauses that refer to parameters (thus, all are assumed
> to be vector arguments.  If the function is defined locally and supplies
> arguments there, it will have DECL_ARGUMENTS and can be handled easily,
> otherwise I just chose to punt, it is too hard for too little gain.
> Perhaps could warn with -Wopenmp-simd about it.  I mean to guard also
> the other warnings about inability to emit simd clones with -Wopenmp-simd.
> 
> > > +      if (count == 0)
> > > +	continue;
> > > +
> > > +      for (int i = 0; i < count * 2; i++)
> > 
> > Here (and also elsewhere) the patch could do with a few extra
> > comments what is happening.
> 
> Ok.
> 
> > > --- gcc/passes.def	(.../trunk)	(revision 205223)
> > > +++ gcc/passes.def	(.../branches/gomp-4_0-branch)	(revision 205231)
> > > @@ -97,6 +97,7 @@ along with GCC; see the file COPYING3.
> > >        NEXT_PASS (pass_feedback_split_functions);
> > >    POP_INSERT_PASSES ()
> > >    NEXT_PASS (pass_ipa_increase_alignment);
> > > +  NEXT_PASS (pass_omp_simd_clone);
> > >    NEXT_PASS (pass_ipa_tm);
> > >    NEXT_PASS (pass_ipa_lower_emutls);
> > >    TERMINATE_PASS_LIST ()
> > 
> > So clones are created before streaming LTO.  You do have vect.exp
> > testcases that are also run through -flto but does it actually
> > "work" there?  I remember seeing changes to cgraph unreachable
> > node removal based on some flag that isn't streamed, no?
> 
> Aldy has done the pass placement, I wonder also whether it wouldn't be
> best to put the OpenMP cloning as the very last IPA pass where all the other
> cloning etc. is already done.
> Right now we want to punt on IPA-CP/IPA-SRA etc. cloning of
> #pragma omp declare simd functions, because if the simd clones are created
> first, then cloning the origins and adjusting calls to them would lead to
> the simd clones not actually being used, and if simd clones are created
> late, on the other side the code isn't able to adjust "omp declare simd"
> attribute (hopefully it could be taught at least e.g. about removing
> arguments, either because they are unused or because they can be assumed
> to be constant, we perhaps could punt only if IPA cloning wants to replace
> an argument with something else).

If you don't need gimple bodies then doing a real IPA pass is possible
but I don't see any advantages as all clones will not yet be referenced
so they are not interesting to any other IPA pass or partitioning.

Doing a late simple IPA pass (the "IPA" passes that LTRANS executes)
would be the easiest IMHO and should side-step all LTO issues nicely.

> > > +		      tree fndecl = gimple_call_fndecl (stmt), op;
> > > +		      if (fndecl != NULL_TREE)
> > > +			{
> > > +			  struct cgraph_node *node = cgraph_get_node (fndecl);
> > > +			  if (node != NULL && node->simd_clones != NULL)
> > 
> > So you use node->simd_clones which also need LTO streaming.
> > 
> > What's the reason you cannot defer SIMD cloning to LTRANS stage
> > as simple IPA pass next to IPA-PTA?
> 
> Yeah, see above.
> > 
> > > +			    {
> > > +			      unsigned int j, n = gimple_call_num_args (stmt);
> > > +			      for (j = 0; j < n; j++)
> > > +				{
> > > +				  op = gimple_call_arg (stmt, j);
> > > +				  if (DECL_P (op)
> > > +				      || (REFERENCE_CLASS_P (op)
> > > +					  && get_base_address (op)))
> > > +				    break;
> > > +				}
> > > +			      op = gimple_call_lhs (stmt);
> > > +			      /* Ignore #pragma omp declare simd functions
> > > +				 if they don't have data references in the
> > > +				 call stmt itself.  */
> > > +			      if (j == n
> > > +				  && !(op
> > > +				       && (DECL_P (op)
> > > +					   || (REFERENCE_CLASS_P (op)
> > > +					       && get_base_address (op)))))
> > > +				continue;
> > 
> > Hmm.  I guess I have an idea now how to "better" support calls in
> > data-ref/dependence analysis.  The above is fine for now - you
> > might want to dump sth here if you fail because datarefs in a declare
> > simd fn call.
> 
> Okay.
> 
> > > +	      if (is_gimple_call (stmt))
> > > +		{
> > > +		  /* Ignore calls with no lhs.  These must be calls to
> > > +		     #pragma omp simd functions, and what vectorization factor
> > > +		     it really needs can't be determined until
> > > +		     vectorizable_simd_clone_call.  */
> > 
> > Ick - that's bad.  Well, or rather it doesn't participate in
> > vectorization factor determining then, resulting in missed
> > vectorizations eventually.  You basically say "any vect factor is ok"
> > here?
> 
> Right.  The thing is, if there is no lhs, I really don't know how it will
> participate in the vectorization factor decision, and won't know it until
> the vectorizable_simd_clone_call call, because whether a particular
> clone is usable depends on which of the arguments are uniform, linear (with
> what linear step) and tons of other things.
> Perhaps if there is just one simd clone or all simd clones have some
> non-empty set of arguments all without uniform/linear clauses, then we could
> pick the smallest of those surely vector args as the one for determining
> vectorization factor.  If those arguments have internal def, then the type
> will be used already somewhere else in the loop to determine vf, so it is
> only about parameters that are passed constant/external def values, but are
> required to be in vector parameters.  But I believe
> vectorizable_simd_clone_call can handle those just fine, say if you have
> all types in the loop long and thus vf decisions are only for long,
> so for AVX2 say vf = 4, then if you have
> #pragma omp declare simd uniform (a) aligned (a : 32) linear (b)
> void foo (long *a, long b, int c);
> and pass constant 23 to it, then if there is a simdlen(4) clone (will be
> on i?86/x86_64), then the last argument is passed in V4SImode parameter
> and the code should handle it fine.  Similarly if all types are int
> and there is a vector long argument passed a constant (or external def),
> it will be passed in two parameters, each one containing half, and the
> function should handle that too.
> > 
> > > +		  if (STMT_VINFO_VECTYPE (stmt_info) == NULL_TREE)
> > > +		    {
> > > +		      unsigned int j, n = gimple_call_num_args (stmt);
> > > +		      for (j = 0; j < n; j++)
> > > +			{
> > > +			  scalar_type = TREE_TYPE (gimple_call_arg (stmt, j));
> > > +			  vectype = get_vectype_for_scalar_type (scalar_type);
> > > +			  if (vectype)
> > > +			    {
> > > +			      STMT_VINFO_VECTYPE (stmt_info) = vectype;
> > > +			      break;
> > > +			    }
> > > +			}
> > > +		    }
> > > +		  if (STMT_VINFO_VECTYPE (stmt_info) != NULL_TREE)
> > > +		    {
> > > +		      if (!analyze_pattern_stmt && gsi_end_p (pattern_def_si))
> > > +			{
> > > +			  pattern_def_seq = NULL;
> > > +			  gsi_next (&si);
> > > +			}
> > > +		      continue;
> > > +		    }
> > 
> > Both cases above need comments - why do you chose the first param
> > for determining STMT_VINFO_VECTYPE?  Isn't STMT_VINFO_VECTYPE
> > completely irrelevant for calls w/o LHS?  Answer: yes it is!
> 
> It is completely irrelevant, yes.
> 
> > I'd have expected an unconditional continue here (and leave
> > STMT_VINFO_VECTYPE == NULL - fact is that the vector type of
> > the argument is determined by its definition and thus may
> > be different from what you record here anyway).
> 
> Unfortunately it doesn't work (tried that).  The way all the
> vectorizable_* functions are called in sequence, most of them
> actually look at STMT_VINFO_VECTYPE before bailing out because
> they are for stmts that aren't simd clone calls and thus ICE/segfault.
> It was much easier to pass some non-NULL value than to change all of them.

Move vectorizable_simd_function first ;)

Or assign a random type (but remove the odd code looking at some
random parameters...)

> > > +  if (stmt_can_throw_internal (stmt))
> > > +    return false;
> > 
> > Can't happen (loop form checks).
> 
> But vectorizable_call has the same call.  So shall both be removed?

Yeah, should probably be moved to a generic place for safety.

> > > +  vectype = STMT_VINFO_VECTYPE (stmt_info);
> > 
> > See above - questionable if this doesn't result from looking at
> > the LHS.
> 
> This particular function just loads it into a variable and uses
> only if it has lhs.

yeah, seen that later

> > > +      if (thisarginfo.vectype != NULL_TREE
> > > +	  && loop_vinfo
> > > +	  && TREE_CODE (op) == SSA_NAME
> > > +	  && simple_iv (loop, loop_containing_stmt (stmt), op, &iv, false)
> > > +	  && tree_fits_shwi_p (iv.step))
> > > +	{
> > > +	  thisarginfo.linear_step = tree_to_shwi (iv.step);
> > 
> > Hmm, you should check thisarginfo.dt instead (I assume this case
> > is for induction/reduction defs)?  In this case you also should
> > use STMT_VINFO_LOOP_PHI_EVOLUTION_PART and not re-analyze via simple_iv.
> 
> I can try that.
> > 
> > > +	  thisarginfo.op = iv.base;
> > > +	}
> > > +      else if (thisarginfo.vectype == NULL_TREE
> > > +	       && POINTER_TYPE_P (TREE_TYPE (op)))
> > > +	thisarginfo.align = get_pointer_alignment (op) / BITS_PER_UNIT;
> > 
> > So this is for dt_external defs?
> 
> I guess even both vect_constant_def and vect_external_def, simply something
> that is uniform.
> 
> > Please switch on thisarginfo.dt here - that more naturally explains
> > what you are doing (otherwise this definitely misses a comment).
> 
> > > +      this_badness += target_badness * 512;
> > > +      /* FORNOW: Have to add code to add the mask argument.  */
> > > +      if (n->simdclone->inbranch)
> > > +	continue;
> > 
> > We don't support if-converting calls anyway, no?
> 
> Not yet.  Supporting them I guess depends on the
> http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01268.html
> http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01437.html
> http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01550.html
> series.  With that infrastructure, I think we could e.g. represent
> the conditional calls as MASK_CALL internal call that would have
> a mask argument (like MASK_LOAD/STORE), then ADDR_EXPR of the
> function decl that has simd clones, plus the original arguments,
> or something similar, then we'd just extract the function decl
> from it in this function and just vectorize the mask argument
> too and pass it through as the last argument (or set of arguments)
> to the inbranch simd clone.
> 
> > > +      for (i = 0; i < nargs; i++)
> > > +	{
> > > +	  switch (n->simdclone->args[i].arg_type)
> > > +	    {
> > > +	    case SIMD_CLONE_ARG_TYPE_VECTOR:
> > > +	      if (!useless_type_conversion_p
> > > +		     (n->simdclone->args[i].orig_type,
> > > +		      TREE_TYPE (gimple_call_arg (stmt, i))))
> > > +		i = -1;
> > 
> > But you don't verify the vectype against the clone vectype?
> 
> The code can handle vector narrowing or widening, splitting
> into multiple arguments etc.  If the clone exist, we know the
> corresponding vector type exists, so does the arginfo[i].vectype
> that the vectorizer gives us the argument in.
> The above only handles the case where arguments are promoted
> from the types in TYPE_ARG_TYPES of the call/DECL_ARGUMENTS
> to something wider in the GIMPLE_CALL (happens for short/char
> arguments apparently).  The above code just punts on it, I don't
> want to have in that function yet another full copy of narrowing/widening
> conversions.  The plan was (so far unimplemented) to handle this
> in tree-vect-patterns.c, if we have say char argument and pass an
> int to it, if the argument is constant, we'd just fold_convert it
> to the right type, if there is widening right before it, we'd use
> the unwidened SSA_NAME instead, otherwise narrow.  Then vf
> determination etc. would handle it right.  Does that look reasonable to you?

The above tests scalar types, not arginfo[].vectype.  I'm concerned
about mismatches there (and miss such check).  There are surely
cases where (with multiple arguments) you cannot create a match.

We can of course add checking if we discover a testcase ;)

> > > +	      else if (arginfo[i].vectype == NULL_TREE
> > 
> > I'd like to see checks based on the def type, not vectype.
> 
> Ok.
> > 
> > > +		       || arginfo[i].linear_step)
> > > +		this_badness += 64;
> > > +	      break;
> > > +	    case SIMD_CLONE_ARG_TYPE_UNIFORM:
> > > +	      if (arginfo[i].vectype != NULL_TREE)
> > 
> > Likewise (and below, too).
> 
> > > +  if (!vec_stmt) /* transformation not required.  */
> > > +    {
> > > +      STMT_VINFO_TYPE (stmt_info) = call_simd_clone_vec_info_type;
> > > +      if (dump_enabled_p ())
> > > +	dump_printf_loc (MSG_NOTE, vect_location,
> > > +			 "=== vectorizable_simd_clone_call ===\n");
> > > +/*      vect_model_simple_cost (stmt_info, ncopies, dt, NULL, NULL); */
> > > +      arginfo.release ();
> > 
> > Please save the result from the analysis (selecting the simd clone)
> > in the stmt_vinfo and skip the analysis during transform phase.
> 
> Just stick there the selected cgraph_node?

Works for me.

> As for the cost computation commented out above, it is hard to predict it
> right, probably we should at least add the cost of the scalar call, so
> the vectorizable function isn't considered cheaper.  But more than that?

No idea - this is the wrong function to do a cost model (other than
selecting between different applicable simd clones).

> > > +		      vec_oprnd0
> > > +			= build3 (BIT_FIELD_REF, atype, vec_oprnd0,
> > > +				  build_int_cst (integer_type_node, prec),
> > > +				  build_int_cst (integer_type_node,
> > > +						 (m & (k - 1)) * prec));
> > 
> > Some helpers to build the tree to select a sub-vector would be nice
> > (I remember seeing this kind of pattern elsewhere).
> 
> Ok, I'll try something.
> 
> > > +		  new_stmt
> > > +		    = gimple_build_assign_with_ops (TREE_CODE (t),
> > > +						    make_ssa_name (vectype,
> > > +								   NULL),
> > > +						    t, NULL_TREE);
> > 
> > For SINGLE_RHS assigns I prefer gimple_build_assign.
> 
> Okay.
> 
> > > +
> > > +  /* Update the exception handling table with the vector stmt if necessary.  */
> > > +  if (maybe_clean_or_replace_eh_stmt (stmt, *vec_stmt))
> > > +    gimple_purge_dead_eh_edges (gimple_bb (stmt));
> > 
> > But you've early-outed on throwing stmts?  Generally this shouldn't 
> > happen.
> 
> This is again a copy from vectorizable_call.  So, do you think it can
> be dropped there too?

Yes.

> > Overall it looks good - it would be nice to split out and commit
> > separately the IPA cloning infrastructure re-org (and the expr.c hunk).
> > 
> > The LTO issue needs to be addressed - the simplest thing to me looks
> > to defer cloning to LTRANS stage.
> 
> Yeah, but the start should be to handle the internal calls that are used
> everywhere now by #pragma omp simd too, and ubsan etc.

Correct - there is a bugreport about it.  The solution is to completely
ignore them when building the cgraph (and fix the fallout - heh).
I can give it a try again.

Richard.
Martin Jambor Nov. 22, 2013, 1:33 p.m. UTC | #4
Hi,

On Fri, Nov 22, 2013 at 12:19:33PM +0100, Jakub Jelinek wrote:
> On Fri, Nov 22, 2013 at 11:08:41AM +0100, Richard Biener wrote:
> > > @@ -284,6 +382,12 @@ public:
> > >    /* Declaration node used to be clone of. */
> > >    tree former_clone_of;
> > >  
> > > +  /* If this is a SIMD clone, this points to the SIMD specific
> > > +     information for it.  */
> > > +  struct cgraph_simd_clone *simdclone;
> > > +  /* If this function has SIMD clones, this points to the first clone.  */
> > > +  struct cgraph_node *simd_clones;
> > > +
> > 
> > I wonder how you run all of this through LTO (I'll see below I guess ;))
> 
> It doesn't work, as in, all the added testcases work just fine without -flto
> and all of them ICE with -flto, but there are multiple known issues with LTO
> before that (internal fns, etc.).  More below.
> 
> > The expr.c hunk is also ok independently of the patch.
> 
> Ok, thanks (though without the rest of the patch probably nothing emits it).
> 
> > > @@ -3758,6 +3772,124 @@ ipa_modify_call_arguments (struct cgraph
> > >    free_dominance_info (CDI_DOMINATORS);
> > >  }
> > 
> > You've run the above through Martin IIRC, but ...
> 
> Aldy did.
> 
> > > +/* If the expression *EXPR should be replaced by a reduction of a parameter, do
> > > +   so.  ADJUSTMENTS is a pointer to a vector of adjustments.  CONVERT
> > > +   specifies whether the function should care about type incompatibility the
> > > +   current and new expressions.  If it is false, the function will leave
> > > +   incompatibility issues to the caller.  Return true iff the expression
> > > +   was modified. */
> > > +
> > > +bool
> > > +ipa_modify_expr (tree *expr, bool convert,
> > > +		 ipa_parm_adjustment_vec adjustments)
> > > +{
> > > +  struct ipa_parm_adjustment *cand
> > > +    = ipa_get_adjustment_candidate (&expr, &convert, adjustments, false);
> > > +  if (!cand)
> > > +    return false;
> > > +
> > > +  tree src;
> > > +  if (cand->by_ref)
> > > +    src = build_simple_mem_ref (cand->new_decl);
> > 
> > is this function mostly copied from elsewhere?  Because
> > using build_simple_mem_ref always smells like possible TBAA problems.
> 
> Perhaps, but this is just code reorg, the same
> 
> -  if (cand->by_ref)
> -    src = build_simple_mem_ref (cand->reduction);
> -  else
> -    src = cand->reduction;
> 
> used to sit in sra_ipa_modify_expr before.

IPA-SRA (in splice_param_accesses) makes sure that it only pushes
dereferences to the caller (which are created by this code) when all
dereferences in the calle have the same reference_alias_ptr_type.  THe
dereference is then made in type of one such dereference.  I hope that
means there are no TBAA issues.

> 
> > 
> > > +  else
> > > +    src = cand->new_decl;
> > > +
> > > +  if (dump_file && (dump_flags & TDF_DETAILS))
> > > +    {
> > > +      fprintf (dump_file, "About to replace expr ");
> > > +      print_generic_expr (dump_file, *expr, 0);
> > > +      fprintf (dump_file, " with ");
> > > +      print_generic_expr (dump_file, src, 0);
> > > +      fprintf (dump_file, "\n");
> > > +    }
> > > +
> > > +  if (convert && !useless_type_conversion_p (TREE_TYPE (*expr), cand->type))
> > > +    {
> > > +      tree vce = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (*expr), src);
> > > +      *expr = vce;
> > 
> > Why build1 and not fold it?  I assume from above you either have a plain 
> > decl (cand->new_decl) or a MEM_REF.  For both cases simply folding
> > the VCE into a MEM_REF works.
> 
> Again, preexisting code from sra_ipa_modify_expr.  Can it be changed
> incrementally/independently of this?

I'm not sure, perhaps it made sense before we had MEM_REF and it was
not converted... or it was simply always a bug.  I can fix this after
the merge, not sure whether now of in the next stage1 though.

> 
> > > +    }
> > > +  else
> > > +    *expr = src;
> > > +  return true;
> > > +}
> > > +
> > > +/* If T is an SSA_NAME, return NULL if it is not a default def or
> > > +   return its base variable if it is.  If IGNORE_DEFAULT_DEF is true,
> > > +   the base variable is always returned, regardless if it is a default
> > > +   def.  Return T if it is not an SSA_NAME.  */
> > > +
> > > +static tree
> > > +get_ssa_base_param (tree t, bool ignore_default_def)
> > > +{
> > > +  if (TREE_CODE (t) == SSA_NAME)
> > > +    {
> > > +      if (ignore_default_def || SSA_NAME_IS_DEFAULT_DEF (t))
> > > +	return SSA_NAME_VAR (t);
> > > +      else
> > > +	return NULL_TREE;
> > > +    }
> > > +  return t;
> > > +}
> > 
> > This function will return non-NULL for non-PARMs - is that intended?
> 
> Again, seems to be preexisting code from tree-sra.c.  Aldy/Martin?

Yeah, at least in the old form it is either checked later on or it
does not matter (we just use the DECL_UID to clear its bit in bitmap
of candidates).  But it would probaly make sense to move the check
there, again as a followup, whether now or for 4.10.

Thanks,

Martin
diff mbox

Patch

--- gcc/cgraph.h	(.../trunk)	(revision 205223)
+++ gcc/cgraph.h	(.../branches/gomp-4_0-branch)	(revision 205231)
@@ -256,6 +261,99 @@  struct GTY(()) cgraph_clone_info
   bitmap combined_args_to_skip;
 };
 
+enum cgraph_simd_clone_arg_type
+{
+  SIMD_CLONE_ARG_TYPE_VECTOR,
+  SIMD_CLONE_ARG_TYPE_UNIFORM,
+  SIMD_CLONE_ARG_TYPE_LINEAR_CONSTANT_STEP,
+  SIMD_CLONE_ARG_TYPE_LINEAR_VARIABLE_STEP,
+  SIMD_CLONE_ARG_TYPE_MASK
+};
+
+/* Function arguments in the original function of a SIMD clone.
+   Supplementary data for `struct simd_clone'.  */
+
+struct GTY(()) cgraph_simd_clone_arg {
+  /* Original function argument as it originally existed in
+     DECL_ARGUMENTS.  */
+  tree orig_arg;
+
+  /* orig_arg's function (or for extern functions type from
+     TYPE_ARG_TYPES).  */
+  tree orig_type;
+
+  /* If argument is a vector, this holds the vector version of
+     orig_arg that after adjusting the argument types will live in
+     DECL_ARGUMENTS.  Otherwise, this is NULL.
+
+     This basically holds:
+       vector(simdlen) __typeof__(orig_arg) new_arg.  */
+  tree vector_arg;
+
+  /* vector_arg's type (or for extern functions new vector type.  */
+  tree vector_type;
+
+  /* If argument is a vector, this holds the array where the simd
+     argument is held while executing the simd clone function.  This
+     is a local variable in the cloned function.  Its content is
+     copied from vector_arg upon entry to the clone.
+
+     This basically holds:
+       __typeof__(orig_arg) simd_array[simdlen].  */
+  tree simd_array;
+
+  /* A SIMD clone's argument can be either linear (constant or
+     variable), uniform, or vector.  */
+  enum cgraph_simd_clone_arg_type arg_type;
+
+  /* For arg_type SIMD_CLONE_ARG_TYPE_LINEAR_CONSTANT_STEP this is
+     the constant linear step, if arg_type is
+     SIMD_CLONE_ARG_TYPE_LINEAR_VARIABLE_STEP, this is index of
+     the uniform argument holding the step, otherwise 0.  */
+  HOST_WIDE_INT linear_step;
+
+  /* Variable alignment if available, otherwise 0.  */
+  unsigned int alignment;
+};
+
+/* Specific data for a SIMD function clone.  */
+
+struct GTY(()) cgraph_simd_clone {
+  /* Number of words in the SIMD lane associated with this clone.  */
+  unsigned int simdlen;
+
+  /* Number of annotated function arguments in `args'.  This is
+     usually the number of named arguments in FNDECL.  */
+  unsigned int nargs;
+
+  /* Max hardware vector size in bits for integral vectors.  */
+  unsigned int vecsize_int;
+
+  /* Max hardware vector size in bits for floating point vectors.  */
+  unsigned int vecsize_float;
+
+  /* The mangling character for a given vector size.  This is is used
+     to determine the ISA mangling bit as specified in the Intel
+     Vector ABI.  */
+  unsigned char vecsize_mangle;
+
+  /* True if this is the masked, in-branch version of the clone,
+     otherwise false.  */
+  unsigned int inbranch : 1;
+
+  /* True if this is a Cilk Plus variant.  */
+  unsigned int cilk_elemental : 1;
+
+  /* Doubly linked list of SIMD clones.  */
+  struct cgraph_node *prev_clone, *next_clone;
+
+  /* Original cgraph node the SIMD clones were created for.  */
+  struct cgraph_node *origin;
+
+  /* Annotated function arguments for the original function.  */
+  struct cgraph_simd_clone_arg GTY((length ("%h.nargs"))) args[1];
+};
+
 
 /* The cgraph data structure.
    Each function decl has assigned cgraph_node listing callees and callers.  */
@@ -284,6 +382,12 @@  public:
   /* Declaration node used to be clone of. */
   tree former_clone_of;
 
+  /* If this is a SIMD clone, this points to the SIMD specific
+     information for it.  */
+  struct cgraph_simd_clone *simdclone;
+  /* If this function has SIMD clones, this points to the first clone.  */
+  struct cgraph_node *simd_clones;
+
   /* Interprocedural passes scheduled to have their transform functions
      applied next time we execute local pass on them.  We maintain it
      per-function in order to allow IPA passes to introduce new functions.  */
--- gcc/config/i386/i386.c	(.../trunk)	(revision 205223)
+++ gcc/config/i386/i386.c	(.../branches/gomp-4_0-branch)	(revision 205231)
@@ -43683,6 +43683,172 @@  ix86_memmodel_check (unsigned HOST_WIDE_
   return val;
 }
 
+/* Set CLONEI->vecsize_mangle, CLONEI->vecsize_int,
+   CLONEI->vecsize_float and if CLONEI->simdlen is 0, also
+   CLONEI->simdlen.  Return 0 if SIMD clones shouldn't be emitted,
+   or number of vecsize_mangle variants that should be emitted.  */
+
+static int
+ix86_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
+					     struct cgraph_simd_clone *clonei,
+					     tree base_type, int num)
+{
+  int ret = 1;
+
+  if (clonei->simdlen
+      && (clonei->simdlen < 2
+	  || clonei->simdlen > 16
+	  || (clonei->simdlen & (clonei->simdlen - 1)) != 0))
+    {
+      warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+		  "unsupported simdlen %d\n", clonei->simdlen);
+      return 0;
+    }
+
+  tree ret_type = TREE_TYPE (TREE_TYPE (node->decl));
+  if (TREE_CODE (ret_type) != VOID_TYPE)
+    switch (TYPE_MODE (ret_type))
+      {
+      case QImode:
+      case HImode:
+      case SImode:
+      case DImode:
+      case SFmode:
+      case DFmode:
+      /* case SCmode: */
+      /* case DCmode: */
+	break;
+      default:
+	warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+		    "unsupported return type %qT for simd\n", ret_type);
+	return 0;
+      }
+
+  tree t;
+  int i;
+
+  for (t = DECL_ARGUMENTS (node->decl), i = 0; t; t = DECL_CHAIN (t), i++)
+    /* FIXME: Shouldn't we allow such arguments if they are uniform?  */
+    switch (TYPE_MODE (TREE_TYPE (t)))
+      {
+      case QImode:
+      case HImode:
+      case SImode:
+      case DImode:
+      case SFmode:
+      case DFmode:
+      /* case SCmode: */
+      /* case DCmode: */
+	break;
+      default:
+	warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+		    "unsupported argument type %qT for simd\n", TREE_TYPE (t));
+	return 0;
+      }
+
+  if (clonei->cilk_elemental)
+    {
+      /* Parse here processor clause.  If not present, default to 'b'.  */
+      clonei->vecsize_mangle = 'b';
+    }
+  else
+    {
+      clonei->vecsize_mangle = "bcd"[num];
+      ret = 3;
+    }
+  switch (clonei->vecsize_mangle)
+    {
+    case 'b':
+      clonei->vecsize_int = 128;
+      clonei->vecsize_float = 128;
+      break;
+    case 'c':
+      clonei->vecsize_int = 128;
+      clonei->vecsize_float = 256;
+      break;
+    case 'd':
+      clonei->vecsize_int = 256;
+      clonei->vecsize_float = 256;
+      break;
+    }
+  if (clonei->simdlen == 0)
+    {
+      if (SCALAR_INT_MODE_P (TYPE_MODE (base_type)))
+	clonei->simdlen = clonei->vecsize_int;
+      else
+	clonei->simdlen = clonei->vecsize_float;
+      clonei->simdlen /= GET_MODE_BITSIZE (TYPE_MODE (base_type));
+      if (clonei->simdlen > 16)
+	clonei->simdlen = 16;
+    }
+  return ret;
+}
+
+/* Add target attribute to SIMD clone NODE if needed.  */
+
+static void
+ix86_simd_clone_adjust (struct cgraph_node *node)
+{
+  const char *str = NULL;
+  gcc_assert (node->decl == cfun->decl);
+  switch (node->simdclone->vecsize_mangle)
+    {
+    case 'b':
+      if (!TARGET_SSE2)
+	str = "sse2";
+      break;
+    case 'c':
+      if (!TARGET_AVX)
+	str = "avx";
+      break;
+    case 'd':
+      if (!TARGET_AVX2)
+	str = "avx2";
+      break;
+    default:
+      gcc_unreachable ();
+    }
+  if (str == NULL)
+    return;
+  push_cfun (NULL);
+  tree args = build_tree_list (NULL_TREE, build_string (strlen (str), str));
+  bool ok = ix86_valid_target_attribute_p (node->decl, NULL, args, 0);
+  gcc_assert (ok);
+  pop_cfun ();
+  ix86_previous_fndecl = NULL_TREE;
+  ix86_set_current_function (node->decl);
+}
+
+/* If SIMD clone NODE can't be used in a vectorized loop
+   in current function, return -1, otherwise return a badness of using it
+   (0 if it is most desirable from vecsize_mangle point of view, 1
+   slightly less desirable, etc.).  */
+
+static int
+ix86_simd_clone_usable (struct cgraph_node *node)
+{
+  switch (node->simdclone->vecsize_mangle)
+    {
+    case 'b':
+      if (!TARGET_SSE2)
+	return -1;
+      if (!TARGET_AVX)
+	return 0;
+      return TARGET_AVX2 ? 2 : 1;
+    case 'c':
+      if (!TARGET_AVX)
+	return -1;
+      return TARGET_AVX2 ? 1 : 0;
+      break;
+    case 'd':
+      if (!TARGET_AVX2)
+	return -1;
+      return 0;
+    default:
+      gcc_unreachable ();
+    }
+}
+
 /* Implement TARGET_FLOAT_EXCEPTIONS_ROUNDING_SUPPORTED_P.  */
 
 static bool
@@ -44171,6 +44337,18 @@  ix86_atomic_assign_expand_fenv (tree *ho
 #undef TARGET_SPILL_CLASS
 #define TARGET_SPILL_CLASS ix86_spill_class
 
+#undef TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN
+#define TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN \
+  ix86_simd_clone_compute_vecsize_and_simdlen
+
+#undef TARGET_SIMD_CLONE_ADJUST
+#define TARGET_SIMD_CLONE_ADJUST \
+  ix86_simd_clone_adjust
+
+#undef TARGET_SIMD_CLONE_USABLE
+#define TARGET_SIMD_CLONE_USABLE \
+  ix86_simd_clone_usable
+
 #undef TARGET_FLOAT_EXCEPTIONS_ROUNDING_SUPPORTED_P
 #define TARGET_FLOAT_EXCEPTIONS_ROUNDING_SUPPORTED_P \
   ix86_float_exceptions_rounding_supported_p
--- gcc/doc/tm.texi	(.../trunk)	(revision 205223)
+++ gcc/doc/tm.texi	(.../branches/gomp-4_0-branch)	(revision 205231)
@@ -5818,6 +5818,26 @@  The default is @code{NULL_TREE} which me
 loads.
 @end deftypefn
 
+@deftypefn {Target Hook} int TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN (struct cgraph_node *@var{}, struct cgraph_simd_clone *@var{}, @var{tree}, @var{int})
+This hook should set @var{vecsize_mangle}, @var{vecsize_int}, @var{vecsize_float}
+fields in @var{simd_clone} structure pointed by @var{clone_info} argument and also
+@var{simdlen} field if it was previously 0.
+The hook should return 0 if SIMD clones shouldn't be emitted,
+or number of @var{vecsize_mangle} variants that should be emitted.
+@end deftypefn
+
+@deftypefn {Target Hook} void TARGET_SIMD_CLONE_ADJUST (struct cgraph_node *@var{})
+This hook should add implicit @code{attribute(target("..."))} attribute
+to SIMD clone @var{node} if needed.
+@end deftypefn
+
+@deftypefn {Target Hook} int TARGET_SIMD_CLONE_USABLE (struct cgraph_node *@var{})
+This hook should return -1 if SIMD clone @var{node} shouldn't be used
+in vectorized loops in current function, or non-negative number if it is
+usable.  In that case, the smaller the number is, the more desirable it is
+to use it.
+@end deftypefn
+
 @node Anchored Addresses
 @section Anchored Addresses
 @cindex anchored addresses
--- gcc/doc/tm.texi.in	(.../trunk)	(revision 205223)
+++ gcc/doc/tm.texi.in	(.../branches/gomp-4_0-branch)	(revision 205231)
@@ -4422,6 +4422,12 @@  address;  but often a machine-dependent
 
 @hook TARGET_VECTORIZE_BUILTIN_GATHER
 
+@hook TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN
+
+@hook TARGET_SIMD_CLONE_ADJUST
+
+@hook TARGET_SIMD_CLONE_USABLE
+
 @node Anchored Addresses
 @section Anchored Addresses
 @cindex anchored addresses
--- gcc/expr.c	(.../trunk)	(revision 205223)
+++ gcc/expr.c	(.../branches/gomp-4_0-branch)	(revision 205231)
@@ -6305,6 +6305,18 @@  store_constructor (tree exp, rtx target,
 	    enum machine_mode mode = GET_MODE (target);
 
 	    icode = (int) optab_handler (vec_init_optab, mode);
+	    /* Don't use vec_init<mode> if some elements have VECTOR_TYPE.  */
+	    if (icode != CODE_FOR_nothing)
+	      {
+		tree value;
+
+		FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (exp), idx, value)
+		  if (TREE_CODE (TREE_TYPE (value)) == VECTOR_TYPE)
+		    {
+		      icode = CODE_FOR_nothing;
+		      break;
+		    }
+	      }
 	    if (icode != CODE_FOR_nothing)
 	      {
 		unsigned int i;
@@ -6382,8 +6394,8 @@  store_constructor (tree exp, rtx target,
 
 	    if (vector)
 	      {
-	        /* Vector CONSTRUCTORs should only be built from smaller
-		   vectors in the case of BLKmode vectors.  */
+		/* vec_init<mode> should not be used if there are VECTOR_TYPE
+		   elements.  */
 		gcc_assert (TREE_CODE (TREE_TYPE (value)) != VECTOR_TYPE);
 		RTVEC_ELT (vector, eltpos)
 		  = expand_normal (value);
--- gcc/ggc.h	(.../trunk)	(revision 205223)
+++ gcc/ggc.h	(.../branches/gomp-4_0-branch)	(revision 205231)
@@ -276,4 +276,11 @@  ggc_alloc_cleared_gimple_statement_stat
     ggc_internal_cleared_alloc_stat (s PASS_MEM_STAT);
 }
 
+static inline struct simd_clone *
+ggc_alloc_cleared_simd_clone_stat (size_t s MEM_STAT_DECL)
+{
+  return (struct simd_clone *)
+    ggc_internal_cleared_alloc_stat (s PASS_MEM_STAT);
+}
+
 #endif
--- gcc/ipa.c	(.../trunk)	(revision 205223)
+++ gcc/ipa.c	(.../branches/gomp-4_0-branch)	(revision 205231)
@@ -247,7 +247,7 @@  walk_polymorphic_call_targets (pointer_s
      hope calls to them will be devirtualized. 
 
      Again we remove them after inlining.  In late optimization some
-     devirtualization may happen, but it is not importnat since we won't inline
+     devirtualization may happen, but it is not important since we won't inline
      the call. In theory early opts and IPA should work out all important cases.
 
    - virtual clones needs bodies of their origins for later materialization;
@@ -275,7 +275,7 @@  walk_polymorphic_call_targets (pointer_s
    by reachable symbols or origins of clones).  The queue is represented
    as linked list by AUX pointer terminated by 1.
 
-   A the end we keep all reachable symbols. For symbols in boundary we always
+   At the end we keep all reachable symbols. For symbols in boundary we always
    turn definition into a declaration, but we may keep function body around
    based on body_needed_for_clonning
 
@@ -427,6 +427,19 @@  symtab_remove_unreachable_nodes (bool be
 		      enqueue_node (cnode, &first, reachable);
 		    }
 		}
+
+	    }
+	  /* If any reachable function has simd clones, mark them as
+	     reachable as well.  */
+	  if (cnode->simd_clones)
+	    {
+	      cgraph_node *next;
+	      for (next = cnode->simd_clones;
+		   next;
+		   next = next->simdclone->next_clone)
+		if (in_boundary_p
+		    || !pointer_set_insert (reachable, next))
+		  enqueue_node (next, &first, reachable);
 	    }
 	}
       /* When we see constructor of external variable, keep referred nodes in the
--- gcc/ipa-cp.c	(.../trunk)	(revision 205223)
+++ gcc/ipa-cp.c	(.../branches/gomp-4_0-branch)	(revision 205231)
@@ -429,6 +429,15 @@  determine_versionability (struct cgraph_
     reason = "not a tree_versionable_function";
   else if (cgraph_function_body_availability (node) <= AVAIL_OVERWRITABLE)
     reason = "insufficient body availability";
+  else if (node->simd_clones != NULL)
+    {
+      /* Ideally we should clone the SIMD clones themselves and create
+	 vector copies of them, so IPA-cp and SIMD clones can happily
+	 coexist, but that may not be worth the effort.  */
+      reason = "function has SIMD clones";
+    }
+  else if (node->simdclone != NULL)
+    reason = "function is SIMD clone";
 
   if (reason && dump_file && !node->alias && !node->thunk.thunk_p)
     fprintf (dump_file, "Function %s/%i is not versionable, reason: %s.\n",
@@ -695,6 +704,8 @@  initialize_node_lattices (struct cgraph_
       else
 	disable = true;
     }
+  else if (node->simdclone)
+    disable = true;
 
   if (disable || variable)
     {
--- gcc/ipa-prop.c	(.../trunk)	(revision 205223)
+++ gcc/ipa-prop.c	(.../branches/gomp-4_0-branch)	(revision 205231)
@@ -3355,8 +3355,8 @@  ipa_get_vector_of_formal_parms (tree fnd
 /* Return a heap allocated vector containing types of formal parameters of
    function type FNTYPE.  */
 
-static inline vec<tree> 
-get_vector_of_formal_parm_types (tree fntype)
+vec<tree>
+ipa_get_vector_of_formal_parm_types (tree fntype)
 {
   vec<tree> types;
   int count = 0;
@@ -3378,32 +3378,22 @@  get_vector_of_formal_parm_types (tree fn
    base_index field.  */
 
 void
-ipa_modify_formal_parameters (tree fndecl, ipa_parm_adjustment_vec adjustments,
-			      const char *synth_parm_prefix)
+ipa_modify_formal_parameters (tree fndecl, ipa_parm_adjustment_vec adjustments)
 {
-  vec<tree> oparms, otypes;
-  tree orig_type, new_type = NULL;
-  tree old_arg_types, t, new_arg_types = NULL;
-  tree parm, *link = &DECL_ARGUMENTS (fndecl);
-  int i, len = adjustments.length ();
-  tree new_reversed = NULL;
-  bool care_for_types, last_parm_void;
-
-  if (!synth_parm_prefix)
-    synth_parm_prefix = "SYNTH";
-
-  oparms = ipa_get_vector_of_formal_parms (fndecl);
-  orig_type = TREE_TYPE (fndecl);
-  old_arg_types = TYPE_ARG_TYPES (orig_type);
+  vec<tree> oparms = ipa_get_vector_of_formal_parms (fndecl);
+  tree orig_type = TREE_TYPE (fndecl);
+  tree old_arg_types = TYPE_ARG_TYPES (orig_type);
 
   /* The following test is an ugly hack, some functions simply don't have any
      arguments in their type.  This is probably a bug but well... */
-  care_for_types = (old_arg_types != NULL_TREE);
+  bool care_for_types = (old_arg_types != NULL_TREE);
+  bool last_parm_void;
+  vec<tree> otypes;
   if (care_for_types)
     {
       last_parm_void = (TREE_VALUE (tree_last (old_arg_types))
 			== void_type_node);
-      otypes = get_vector_of_formal_parm_types (orig_type);
+      otypes = ipa_get_vector_of_formal_parm_types (orig_type);
       if (last_parm_void)
 	gcc_assert (oparms.length () + 1 == otypes.length ());
       else
@@ -3415,16 +3405,23 @@  ipa_modify_formal_parameters (tree fndec
       otypes.create (0);
     }
 
-  for (i = 0; i < len; i++)
+  int len = adjustments.length ();
+  tree *link = &DECL_ARGUMENTS (fndecl);
+  tree new_arg_types = NULL;
+  for (int i = 0; i < len; i++)
     {
       struct ipa_parm_adjustment *adj;
       gcc_assert (link);
 
       adj = &adjustments[i];
-      parm = oparms[adj->base_index];
+      tree parm;
+      if (adj->op == IPA_PARM_OP_NEW)
+	parm = NULL;
+      else
+	parm = oparms[adj->base_index];
       adj->base = parm;
 
-      if (adj->copy_param)
+      if (adj->op == IPA_PARM_OP_COPY)
 	{
 	  if (care_for_types)
 	    new_arg_types = tree_cons (NULL_TREE, otypes[adj->base_index],
@@ -3432,23 +3429,36 @@  ipa_modify_formal_parameters (tree fndec
 	  *link = parm;
 	  link = &DECL_CHAIN (parm);
 	}
-      else if (!adj->remove_param)
+      else if (adj->op != IPA_PARM_OP_REMOVE)
 	{
 	  tree new_parm;
 	  tree ptype;
 
-	  if (adj->by_ref)
-	    ptype = build_pointer_type (adj->type);
+	  if (adj->simdlen)
+	    {
+	      /* If we have a non-null simdlen but by_ref is true, we
+		 want a vector of pointers.  Build the vector of
+		 pointers here, not a pointer to a vector in the
+		 adj->by_ref case below.  */
+	      ptype = build_vector_type (adj->type, adj->simdlen);
+	    }
+	  else if (adj->by_ref)
+	    {
+	      ptype = build_pointer_type (adj->type);
+	    }
 	  else
-	    ptype = adj->type;
+	    {
+	      gcc_checking_assert (!adj->by_ref || adj->simdlen);
+	      ptype = adj->type;
+	    }
 
 	  if (care_for_types)
 	    new_arg_types = tree_cons (NULL_TREE, ptype, new_arg_types);
 
 	  new_parm = build_decl (UNKNOWN_LOCATION, PARM_DECL, NULL_TREE,
 				 ptype);
-	  DECL_NAME (new_parm) = create_tmp_var_name (synth_parm_prefix);
-
+	  const char *prefix = adj->arg_prefix ? adj->arg_prefix : "SYNTH";
+	  DECL_NAME (new_parm) = create_tmp_var_name (prefix);
 	  DECL_ARTIFICIAL (new_parm) = 1;
 	  DECL_ARG_TYPE (new_parm) = ptype;
 	  DECL_CONTEXT (new_parm) = fndecl;
@@ -3456,17 +3466,20 @@  ipa_modify_formal_parameters (tree fndec
 	  DECL_IGNORED_P (new_parm) = 1;
 	  layout_decl (new_parm, 0);
 
-	  adj->base = parm;
-	  adj->reduction = new_parm;
+	  if (adj->op == IPA_PARM_OP_NEW)
+	    adj->base = NULL;
+	  else
+	    adj->base = parm;
+	  adj->new_decl = new_parm;
 
 	  *link = new_parm;
-
 	  link = &DECL_CHAIN (new_parm);
 	}
     }
 
   *link = NULL_TREE;
 
+  tree new_reversed = NULL;
   if (care_for_types)
     {
       new_reversed = nreverse (new_arg_types);
@@ -3484,8 +3497,9 @@  ipa_modify_formal_parameters (tree fndec
      Exception is METHOD_TYPEs must have THIS argument.
      When we are asked to remove it, we need to build new FUNCTION_TYPE
      instead.  */
+  tree new_type = NULL;
   if (TREE_CODE (orig_type) != METHOD_TYPE
-       || (adjustments[0].copy_param
+       || (adjustments[0].op == IPA_PARM_OP_COPY
 	  && adjustments[0].base_index == 0))
     {
       new_type = build_distinct_type_copy (orig_type);
@@ -3509,7 +3523,7 @@  ipa_modify_formal_parameters (tree fndec
 
   /* This is a new type, not a copy of an old type.  Need to reassociate
      variants.  We can handle everything except the main variant lazily.  */
-  t = TYPE_MAIN_VARIANT (orig_type);
+  tree t = TYPE_MAIN_VARIANT (orig_type);
   if (orig_type != t)
     {
       TYPE_MAIN_VARIANT (new_type) = t;
@@ -3558,13 +3572,13 @@  ipa_modify_call_arguments (struct cgraph
 
       adj = &adjustments[i];
 
-      if (adj->copy_param)
+      if (adj->op == IPA_PARM_OP_COPY)
 	{
 	  tree arg = gimple_call_arg (stmt, adj->base_index);
 
 	  vargs.quick_push (arg);
 	}
-      else if (!adj->remove_param)
+      else if (adj->op != IPA_PARM_OP_REMOVE)
 	{
 	  tree expr, base, off;
 	  location_t loc;
@@ -3683,7 +3697,7 @@  ipa_modify_call_arguments (struct cgraph
 					   NULL, true, GSI_SAME_STMT);
 	  vargs.quick_push (expr);
 	}
-      if (!adj->copy_param && MAY_HAVE_DEBUG_STMTS)
+      if (adj->op != IPA_PARM_OP_COPY && MAY_HAVE_DEBUG_STMTS)
 	{
 	  unsigned int ix;
 	  tree ddecl = NULL_TREE, origin = DECL_ORIGIN (adj->base), arg;
@@ -3758,6 +3772,124 @@  ipa_modify_call_arguments (struct cgraph
   free_dominance_info (CDI_DOMINATORS);
 }
 
+/* If the expression *EXPR should be replaced by a reduction of a parameter, do
+   so.  ADJUSTMENTS is a pointer to a vector of adjustments.  CONVERT
+   specifies whether the function should care about type incompatibility the
+   current and new expressions.  If it is false, the function will leave
+   incompatibility issues to the caller.  Return true iff the expression
+   was modified. */
+
+bool
+ipa_modify_expr (tree *expr, bool convert,
+		 ipa_parm_adjustment_vec adjustments)
+{
+  struct ipa_parm_adjustment *cand
+    = ipa_get_adjustment_candidate (&expr, &convert, adjustments, false);
+  if (!cand)
+    return false;
+
+  tree src;
+  if (cand->by_ref)
+    src = build_simple_mem_ref (cand->new_decl);
+  else
+    src = cand->new_decl;
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "About to replace expr ");
+      print_generic_expr (dump_file, *expr, 0);
+      fprintf (dump_file, " with ");
+      print_generic_expr (dump_file, src, 0);
+      fprintf (dump_file, "\n");
+    }
+
+  if (convert && !useless_type_conversion_p (TREE_TYPE (*expr), cand->type))
+    {
+      tree vce = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (*expr), src);
+      *expr = vce;
+    }
+  else
+    *expr = src;
+  return true;
+}
+
+/* If T is an SSA_NAME, return NULL if it is not a default def or
+   return its base variable if it is.  If IGNORE_DEFAULT_DEF is true,
+   the base variable is always returned, regardless if it is a default
+   def.  Return T if it is not an SSA_NAME.  */
+
+static tree
+get_ssa_base_param (tree t, bool ignore_default_def)
+{
+  if (TREE_CODE (t) == SSA_NAME)
+    {
+      if (ignore_default_def || SSA_NAME_IS_DEFAULT_DEF (t))
+	return SSA_NAME_VAR (t);
+      else
+	return NULL_TREE;
+    }
+  return t;
+}
+
+/* Given an expression, return an adjustment entry specifying the
+   transformation to be done on EXPR.  If no suitable adjustment entry
+   was found, returns NULL.
+
+   If IGNORE_DEFAULT_DEF is set, consider SSA_NAMEs which are not a
+   default def, otherwise bail on them.
+
+   If CONVERT is non-NULL, this function will set *CONVERT if the
+   expression provided is a component reference.  ADJUSTMENTS is the
+   adjustments vector.  */
+
+ipa_parm_adjustment *
+ipa_get_adjustment_candidate (tree **expr, bool *convert,
+			      ipa_parm_adjustment_vec adjustments,
+			      bool ignore_default_def)
+{
+  if (TREE_CODE (**expr) == BIT_FIELD_REF
+      || TREE_CODE (**expr) == IMAGPART_EXPR
+      || TREE_CODE (**expr) == REALPART_EXPR)
+    {
+      *expr = &TREE_OPERAND (**expr, 0);
+      if (convert)
+	*convert = true;
+    }
+
+  HOST_WIDE_INT offset, size, max_size;
+  tree base = get_ref_base_and_extent (**expr, &offset, &size, &max_size);
+  if (!base || size == -1 || max_size == -1)
+    return NULL;
+
+  if (TREE_CODE (base) == MEM_REF)
+    {
+      offset += mem_ref_offset (base).low * BITS_PER_UNIT;
+      base = TREE_OPERAND (base, 0);
+    }
+
+  base = get_ssa_base_param (base, ignore_default_def);
+  if (!base || TREE_CODE (base) != PARM_DECL)
+    return NULL;
+
+  struct ipa_parm_adjustment *cand = NULL;
+  unsigned int len = adjustments.length ();
+  for (unsigned i = 0; i < len; i++)
+    {
+      struct ipa_parm_adjustment *adj = &adjustments[i];
+
+      if (adj->base == base
+	  && (adj->offset == offset || adj->op == IPA_PARM_OP_REMOVE))
+	{
+	  cand = adj;
+	  break;
+	}
+    }
+
+  if (!cand || cand->op == IPA_PARM_OP_COPY || cand->op == IPA_PARM_OP_REMOVE)
+    return NULL;
+  return cand;
+}
+
 /* Return true iff BASE_INDEX is in ADJUSTMENTS more than once.  */
 
 static bool
@@ -3803,10 +3935,14 @@  ipa_combine_adjustments (ipa_parm_adjust
       struct ipa_parm_adjustment *n;
       n = &inner[i];
 
-      if (n->remove_param)
+      if (n->op == IPA_PARM_OP_REMOVE)
 	removals++;
       else
-	tmp.quick_push (*n);
+	{
+	  /* FIXME: Handling of new arguments are not implemented yet.  */
+	  gcc_assert (n->op != IPA_PARM_OP_NEW);
+	  tmp.quick_push (*n);
+	}
     }
 
   adjustments.create (outlen + removals);
@@ -3817,27 +3953,32 @@  ipa_combine_adjustments (ipa_parm_adjust
       struct ipa_parm_adjustment *in = &tmp[out->base_index];
 
       memset (&r, 0, sizeof (r));
-      gcc_assert (!in->remove_param);
-      if (out->remove_param)
+      gcc_assert (in->op != IPA_PARM_OP_REMOVE);
+      if (out->op == IPA_PARM_OP_REMOVE)
 	{
 	  if (!index_in_adjustments_multiple_times_p (in->base_index, tmp))
 	    {
-	      r.remove_param = true;
+	      r.op = IPA_PARM_OP_REMOVE;
 	      adjustments.quick_push (r);
 	    }
 	  continue;
 	}
+      else
+	{
+	  /* FIXME: Handling of new arguments are not implemented yet.  */
+	  gcc_assert (out->op != IPA_PARM_OP_NEW);
+	}
 
       r.base_index = in->base_index;
       r.type = out->type;
 
       /* FIXME:  Create nonlocal value too.  */
 
-      if (in->copy_param && out->copy_param)
-	r.copy_param = true;
-      else if (in->copy_param)
+      if (in->op == IPA_PARM_OP_COPY && out->op == IPA_PARM_OP_COPY)
+	r.op = IPA_PARM_OP_COPY;
+      else if (in->op == IPA_PARM_OP_COPY)
 	r.offset = out->offset;
-      else if (out->copy_param)
+      else if (out->op == IPA_PARM_OP_COPY)
 	r.offset = in->offset;
       else
 	r.offset = in->offset + out->offset;
@@ -3848,7 +3989,7 @@  ipa_combine_adjustments (ipa_parm_adjust
     {
       struct ipa_parm_adjustment *n = &inner[i];
 
-      if (n->remove_param)
+      if (n->op == IPA_PARM_OP_REMOVE)
 	adjustments.quick_push (*n);
     }
 
@@ -3885,10 +4026,10 @@  ipa_dump_param_adjustments (FILE *file,
 	  fprintf (file, ", base: ");
 	  print_generic_expr (file, adj->base, 0);
 	}
-      if (adj->reduction)
+      if (adj->new_decl)
 	{
-	  fprintf (file, ", reduction: ");
-	  print_generic_expr (file, adj->reduction, 0);
+	  fprintf (file, ", new_decl: ");
+	  print_generic_expr (file, adj->new_decl, 0);
 	}
       if (adj->new_ssa_base)
 	{
@@ -3896,9 +4037,9 @@  ipa_dump_param_adjustments (FILE *file,
 	  print_generic_expr (file, adj->new_ssa_base, 0);
 	}
 
-      if (adj->copy_param)
+      if (adj->op == IPA_PARM_OP_COPY)
 	fprintf (file, ", copy_param");
-      else if (adj->remove_param)
+      else if (adj->op == IPA_PARM_OP_REMOVE)
 	fprintf (file, ", remove_param");
       else
 	fprintf (file, ", offset %li", (long) adj->offset);
--- gcc/ipa-prop.h	(.../trunk)	(revision 205223)
+++ gcc/ipa-prop.h	(.../branches/gomp-4_0-branch)	(revision 205231)
@@ -609,6 +609,27 @@  extern alloc_pool ipcp_values_pool;
 extern alloc_pool ipcp_sources_pool;
 extern alloc_pool ipcp_agg_lattice_pool;
 
+/* Operation to be performed for the parameter in ipa_parm_adjustment
+   below.  */
+enum ipa_parm_op {
+  IPA_PARM_OP_NONE,
+
+  /* This describes a brand new parameter.
+
+     The field `type' should be set to the new type, `arg_prefix'
+     should be set to the string prefix for the new DECL_NAME, and
+     `new_decl' will ultimately hold the newly created argument.  */
+  IPA_PARM_OP_NEW,
+
+  /* This new parameter is an unmodified parameter at index base_index. */
+  IPA_PARM_OP_COPY,
+
+  /* This adjustment describes a parameter that is about to be removed
+     completely.  Most users will probably need to book keep those so that they
+     don't leave behinfd any non default def ssa names belonging to them.  */
+  IPA_PARM_OP_REMOVE
+};
+
 /* Structure to describe transformations of formal parameters and actual
    arguments.  Each instance describes one new parameter and they are meant to
    be stored in a vector.  Additionally, most users will probably want to store
@@ -632,10 +653,11 @@  struct ipa_parm_adjustment
      arguments.  */
   tree alias_ptr_type;
 
-  /* The new declaration when creating/replacing a parameter.  Created by
-     ipa_modify_formal_parameters, useful for functions modifying the body
-     accordingly. */
-  tree reduction;
+  /* The new declaration when creating/replacing a parameter.  Created
+     by ipa_modify_formal_parameters, useful for functions modifying
+     the body accordingly.  For brand new arguments, this is the newly
+     created argument.  */
+  tree new_decl;
 
   /* New declaration of a substitute variable that we may use to replace all
      non-default-def ssa names when a parm decl is going away.  */
@@ -645,22 +667,23 @@  struct ipa_parm_adjustment
      is NULL), this is going to be its nonlocalized vars value.  */
   tree nonlocal_value;
 
+  /* This holds the prefix to be used for the new DECL_NAME.  */
+  const char *arg_prefix;
+
   /* Offset into the original parameter (for the cases when the new parameter
      is a component of an original one).  */
   HOST_WIDE_INT offset;
 
-  /* Zero based index of the original parameter this one is based on.  (ATM
-     there is no way to insert a new parameter out of the blue because there is
-     no need but if it arises the code can be easily exteded to do so.)  */
+  /* Zero based index of the original parameter this one is based on.  */
   int base_index;
 
-  /* This new parameter is an unmodified parameter at index base_index. */
-  unsigned copy_param : 1;
-
-  /* This adjustment describes a parameter that is about to be removed
-     completely.  Most users will probably need to book keep those so that they
-     don't leave behinfd any non default def ssa names belonging to them.  */
-  unsigned remove_param : 1;
+  /* If non-null, the parameter is a vector of `type' with this many
+     elements.  */
+  int simdlen;
+
+  /* Whether this parameter is a new parameter, a copy of an old one,
+     or one about to be removed.  */
+  enum ipa_parm_op op;
 
   /* The parameter is to be passed by reference.  */
   unsigned by_ref : 1;
@@ -671,8 +694,8 @@  typedef struct ipa_parm_adjustment ipa_p
 typedef vec<ipa_parm_adjustment_t> ipa_parm_adjustment_vec;
 
 vec<tree> ipa_get_vector_of_formal_parms (tree fndecl);
-void ipa_modify_formal_parameters (tree fndecl, ipa_parm_adjustment_vec,
-				   const char *);
+vec<tree> ipa_get_vector_of_formal_parm_types (tree fntype);
+void ipa_modify_formal_parameters (tree fndecl, ipa_parm_adjustment_vec);
 void ipa_modify_call_arguments (struct cgraph_edge *, gimple,
 				ipa_parm_adjustment_vec);
 ipa_parm_adjustment_vec ipa_combine_adjustments (ipa_parm_adjustment_vec,
@@ -690,6 +713,10 @@  tree ipa_value_from_jfunc (struct ipa_no
 			   struct ipa_jump_func *jfunc);
 unsigned int ipcp_transform_function (struct cgraph_node *node);
 void ipa_dump_param (FILE *, struct ipa_node_params *info, int i);
+bool ipa_modify_expr (tree *, bool, ipa_parm_adjustment_vec);
+ipa_parm_adjustment *ipa_get_adjustment_candidate (tree **, bool *,
+						   ipa_parm_adjustment_vec,
+						   bool);
 
 
 /* From tree-sra.c:  */
--- gcc/omp-low.c.jj	2013-11-21 09:25:07.000000000 +0100
+++ gcc/omp-low.c	2013-11-21 22:17:19.334300797 +0100
@@ -61,6 +61,8 @@  along with GCC; see the file COPYING3.
 #include "omp-low.h"
 #include "gimple-low.h"
 #include "tree-cfgcleanup.h"
+#include "pretty-print.h"
+#include "ipa-prop.h"
 #include "tree-nested.h"
 
 
@@ -10573,5 +10677,1151 @@  make_pass_diagnose_omp_blocks (gcc::cont
 {
   return new pass_diagnose_omp_blocks (ctxt);
 }
+
+/* SIMD clone supporting code.  */
+
+/* Allocate a fresh `simd_clone' and return it.  NARGS is the number
+   of arguments to reserve space for.  */
+
+static struct cgraph_simd_clone *
+simd_clone_struct_alloc (int nargs)
+{
+  struct cgraph_simd_clone *clone_info;
+  size_t len = (sizeof (struct cgraph_simd_clone)
+		+ nargs * sizeof (struct cgraph_simd_clone_arg));
+  clone_info = (struct cgraph_simd_clone *)
+	       ggc_internal_cleared_alloc_stat (len PASS_MEM_STAT);
+  return clone_info;
+}
+
+/* Make a copy of the `struct cgraph_simd_clone' in FROM to TO.  */
+
+static inline void
+simd_clone_struct_copy (struct cgraph_simd_clone *to,
+			struct cgraph_simd_clone *from)
+{
+  memcpy (to, from, (sizeof (struct cgraph_simd_clone)
+		     + from->nargs * sizeof (struct cgraph_simd_clone_arg)));
+}
+
+/* Return vector of parameter types of function FNDECL.  This uses
+   TYPE_ARG_TYPES if available, otherwise falls back to types of
+   DECL_ARGUMENTS types.  */
+
+vec<tree>
+simd_clone_vector_of_formal_parm_types (tree fndecl)
+{
+  if (TYPE_ARG_TYPES (TREE_TYPE (fndecl)))
+    return ipa_get_vector_of_formal_parm_types (TREE_TYPE (fndecl));
+  vec<tree> args = ipa_get_vector_of_formal_parms (fndecl);
+  unsigned int i;
+  tree arg;
+  FOR_EACH_VEC_ELT (args, i, arg)
+    args[i] = TREE_TYPE (args[i]);
+  return args;
+}
+
+/* Given a simd function in NODE, extract the simd specific
+   information from the OMP clauses passed in CLAUSES, and return
+   the struct cgraph_simd_clone * if it should be cloned.  *INBRANCH_SPECIFIED
+   is set to TRUE if the `inbranch' or `notinbranch' clause specified,
+   otherwise set to FALSE.  */
+
+static struct cgraph_simd_clone *
+simd_clone_clauses_extract (struct cgraph_node *node, tree clauses,
+			    bool *inbranch_specified)
+{
+  vec<tree> args = simd_clone_vector_of_formal_parm_types (node->decl);
+  tree t;
+  int n;
+  *inbranch_specified = false;
+
+  n = args.length ();
+  if (n > 0 && args.last () == void_type_node)
+    n--;
+
+  /* To distinguish from an OpenMP simd clone, Cilk Plus functions to
+     be cloned have a distinctive artificial label in addition to "omp
+     declare simd".  */
+  bool cilk_clone
+    = (flag_enable_cilkplus
+       && lookup_attribute ("cilk plus elemental",
+			    DECL_ATTRIBUTES (node->decl)));
+
+  /* Allocate one more than needed just in case this is an in-branch
+     clone which will require a mask argument.  */
+  struct cgraph_simd_clone *clone_info = simd_clone_struct_alloc (n + 1);
+  clone_info->nargs = n;
+  clone_info->cilk_elemental = cilk_clone;
+
+  if (!clauses)
+    {
+      args.release ();
+      return clone_info;
+    }
+  clauses = TREE_VALUE (clauses);
+  if (!clauses || TREE_CODE (clauses) != OMP_CLAUSE)
+    return clone_info;
+
+  for (t = clauses; t; t = OMP_CLAUSE_CHAIN (t))
+    {
+      switch (OMP_CLAUSE_CODE (t))
+	{
+	case OMP_CLAUSE_INBRANCH:
+	  clone_info->inbranch = 1;
+	  *inbranch_specified = true;
+	  break;
+	case OMP_CLAUSE_NOTINBRANCH:
+	  clone_info->inbranch = 0;
+	  *inbranch_specified = true;
+	  break;
+	case OMP_CLAUSE_SIMDLEN:
+	  clone_info->simdlen
+	    = TREE_INT_CST_LOW (OMP_CLAUSE_SIMDLEN_EXPR (t));
+	  break;
+	case OMP_CLAUSE_LINEAR:
+	  {
+	    tree decl = OMP_CLAUSE_DECL (t);
+	    tree step = OMP_CLAUSE_LINEAR_STEP (t);
+	    int argno = TREE_INT_CST_LOW (decl);
+	    if (OMP_CLAUSE_LINEAR_VARIABLE_STRIDE (t))
+	      {
+		clone_info->args[argno].arg_type
+		  = SIMD_CLONE_ARG_TYPE_LINEAR_VARIABLE_STEP;
+		clone_info->args[argno].linear_step = tree_to_shwi (step);
+		gcc_assert (clone_info->args[argno].linear_step >= 0
+			    && clone_info->args[argno].linear_step < n);
+	      }
+	    else
+	      {
+		if (POINTER_TYPE_P (args[argno]))
+		  step = fold_convert (ssizetype, step);
+		if (!tree_fits_shwi_p (step))
+		  {
+		    warning_at (OMP_CLAUSE_LOCATION (t), 0,
+				"ignoring large linear step");
+		    args.release ();
+		    return NULL;
+		  }
+		else if (integer_zerop (step))
+		  {
+		    warning_at (OMP_CLAUSE_LOCATION (t), 0,
+				"ignoring zero linear step");
+		    args.release ();
+		    return NULL;
+		  }
+		else
+		  {
+		    clone_info->args[argno].arg_type
+		      = SIMD_CLONE_ARG_TYPE_LINEAR_CONSTANT_STEP;
+		    clone_info->args[argno].linear_step = tree_to_shwi (step);
+		  }
+	      }
+	    break;
+	  }
+	case OMP_CLAUSE_UNIFORM:
+	  {
+	    tree decl = OMP_CLAUSE_DECL (t);
+	    int argno = tree_to_uhwi (decl);
+	    clone_info->args[argno].arg_type
+	      = SIMD_CLONE_ARG_TYPE_UNIFORM;
+	    break;
+	  }
+	case OMP_CLAUSE_ALIGNED:
+	  {
+	    tree decl = OMP_CLAUSE_DECL (t);
+	    int argno = tree_to_uhwi (decl);
+	    clone_info->args[argno].alignment
+	      = TREE_INT_CST_LOW (OMP_CLAUSE_ALIGNED_ALIGNMENT (t));
+	    break;
+	  }
+	default:
+	  break;
+	}
+    }
+  args.release ();
+  return clone_info;
+}
+
+/* Given a SIMD clone in NODE, calculate the characteristic data
+   type and return the coresponding type.  The characteristic data
+   type is computed as described in the Intel Vector ABI.  */
+
+static tree
+simd_clone_compute_base_data_type (struct cgraph_node *node,
+				   struct cgraph_simd_clone *clone_info)
+{
+  tree type = integer_type_node;
+  tree fndecl = node->decl;
+
+  /* a) For non-void function, the characteristic data type is the
+        return type.  */
+  if (TREE_CODE (TREE_TYPE (TREE_TYPE (fndecl))) != VOID_TYPE)
+    type = TREE_TYPE (TREE_TYPE (fndecl));
+
+  /* b) If the function has any non-uniform, non-linear parameters,
+        then the characteristic data type is the type of the first
+        such parameter.  */
+  else
+    {
+      vec<tree> map = simd_clone_vector_of_formal_parm_types (fndecl);
+      for (unsigned int i = 0; i < clone_info->nargs; ++i)
+	if (clone_info->args[i].arg_type == SIMD_CLONE_ARG_TYPE_VECTOR)
+	  {
+	    type = map[i];
+	    break;
+	  }
+      map.release ();
+    }
+
+  /* c) If the characteristic data type determined by a) or b) above
+        is struct, union, or class type which is pass-by-value (except
+        for the type that maps to the built-in complex data type), the
+        characteristic data type is int.  */
+  if (RECORD_OR_UNION_TYPE_P (type)
+      && !aggregate_value_p (type, NULL)
+      && TREE_CODE (type) != COMPLEX_TYPE)
+    return integer_type_node;
+
+  /* d) If none of the above three classes is applicable, the
+        characteristic data type is int.  */
+
+  return type;
+
+  /* e) For Intel Xeon Phi native and offload compilation, if the
+        resulting characteristic data type is 8-bit or 16-bit integer
+        data type, the characteristic data type is int.  */
+  /* Well, we don't handle Xeon Phi yet.  */
+}
+
+static tree
+simd_clone_mangle (struct cgraph_node *node,
+		   struct cgraph_simd_clone *clone_info)
+{
+  char vecsize_mangle = clone_info->vecsize_mangle;
+  char mask = clone_info->inbranch ? 'M' : 'N';
+  unsigned int simdlen = clone_info->simdlen;
+  unsigned int n;
+  pretty_printer pp;
+
+  gcc_assert (vecsize_mangle && simdlen);
+
+  pp_string (&pp, "_ZGV");
+  pp_character (&pp, vecsize_mangle);
+  pp_character (&pp, mask);
+  pp_decimal_int (&pp, simdlen);
+
+  for (n = 0; n < clone_info->nargs; ++n)
+    {
+      struct cgraph_simd_clone_arg arg = clone_info->args[n];
+
+      if (arg.arg_type == SIMD_CLONE_ARG_TYPE_UNIFORM)
+	pp_character (&pp, 'u');
+      else if (arg.arg_type == SIMD_CLONE_ARG_TYPE_LINEAR_CONSTANT_STEP)
+	{
+	  gcc_assert (arg.linear_step != 0);
+	  pp_character (&pp, 'l');
+	  if (arg.linear_step > 1)
+	    pp_unsigned_wide_integer (&pp, arg.linear_step);
+	  else if (arg.linear_step < 0)
+	    {
+	      pp_character (&pp, 'n');
+	      pp_unsigned_wide_integer (&pp, (-(unsigned HOST_WIDE_INT)
+					      arg.linear_step));
+	    }
+	}
+      else if (arg.arg_type == SIMD_CLONE_ARG_TYPE_LINEAR_VARIABLE_STEP)
+	{
+	  pp_character (&pp, 's');
+	  pp_unsigned_wide_integer (&pp, arg.linear_step);
+	}
+      else
+	pp_character (&pp, 'v');
+      if (arg.alignment)
+	{
+	  pp_character (&pp, 'a');
+	  pp_decimal_int (&pp, arg.alignment);
+	}
+    }
+
+  pp_underscore (&pp);
+  pp_string (&pp,
+	     IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (node->decl)));
+  const char *str = pp_formatted_text (&pp);
+
+  /* If there already is a SIMD clone with the same mangled name, don't
+     add another one.  This can happen e.g. for
+     #pragma omp declare simd
+     #pragma omp declare simd simdlen(8)
+     int foo (int, int);
+     if the simdlen is assumed to be 8 for the first one, etc.  */
+  for (struct cgraph_node *clone = node->simd_clones; clone;
+       clone = clone->simdclone->next_clone)
+    if (strcmp (IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (clone->decl)),
+		str) == 0)
+      return NULL_TREE;
+
+  return get_identifier (str);
+}
+
+/* Create a simd clone of OLD_NODE and return it.  */
+
+static struct cgraph_node *
+simd_clone_create (struct cgraph_node *old_node)
+{
+  struct cgraph_node *new_node;
+  if (old_node->definition)
+    new_node = cgraph_function_versioning (old_node, vNULL, NULL, NULL, false,
+					   NULL, NULL, "simdclone");
+  else
+    {
+      tree old_decl = old_node->decl;
+      tree new_decl = copy_node (old_node->decl);
+      DECL_NAME (new_decl) = clone_function_name (old_decl, "simdclone");
+      SET_DECL_ASSEMBLER_NAME (new_decl, DECL_NAME (new_decl));
+      SET_DECL_RTL (new_decl, NULL);
+      DECL_STATIC_CONSTRUCTOR (new_decl) = 0;
+      DECL_STATIC_DESTRUCTOR (new_decl) = 0;
+      new_node
+	= cgraph_copy_node_for_versioning (old_node, new_decl, vNULL, NULL);
+      cgraph_call_function_insertion_hooks (new_node);
+    }
+  if (new_node == NULL)
+    return new_node;
+
+  TREE_PUBLIC (new_node->decl) = TREE_PUBLIC (old_node->decl);
+
+  /* The function cgraph_function_versioning () will force the new
+     symbol local.  Undo this, and inherit external visability from
+     the old node.  */
+  new_node->local.local = old_node->local.local;
+  new_node->externally_visible = old_node->externally_visible;
+
+  return new_node;
+}
+
+/* Adjust the return type of the given function to its appropriate
+   vector counterpart.  Returns a simd array to be used throughout the
+   function as a return value.  */
+
+static tree
+simd_clone_adjust_return_type (struct cgraph_node *node)
+{
+  tree fndecl = node->decl;
+  tree orig_rettype = TREE_TYPE (TREE_TYPE (fndecl));
+  unsigned int veclen;
+  tree t;
+
+  /* Adjust the function return type.  */
+  if (orig_rettype == void_type_node)
+    return NULL_TREE;
+  TREE_TYPE (fndecl) = build_distinct_type_copy (TREE_TYPE (fndecl));
+  if (INTEGRAL_TYPE_P (TREE_TYPE (TREE_TYPE (fndecl)))
+      || POINTER_TYPE_P (TREE_TYPE (TREE_TYPE (fndecl))))
+    veclen = node->simdclone->vecsize_int;
+  else
+    veclen = node->simdclone->vecsize_float;
+  veclen /= GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (fndecl))));
+  if (veclen > node->simdclone->simdlen)
+    veclen = node->simdclone->simdlen;
+  if (veclen == node->simdclone->simdlen)
+    TREE_TYPE (TREE_TYPE (fndecl))
+      = build_vector_type (TREE_TYPE (TREE_TYPE (fndecl)),
+			   node->simdclone->simdlen);
+  else
+    {
+      t = build_vector_type (TREE_TYPE (TREE_TYPE (fndecl)), veclen);
+      t = build_array_type_nelts (t, node->simdclone->simdlen / veclen);
+      TREE_TYPE (TREE_TYPE (fndecl)) = t;
+    }
+  if (!node->definition)
+    return NULL_TREE;
+
+  t = DECL_RESULT (fndecl);
+  /* Adjust the DECL_RESULT.  */
+  gcc_assert (TREE_TYPE (t) != void_type_node);
+  TREE_TYPE (t) = TREE_TYPE (TREE_TYPE (fndecl));
+  relayout_decl (t);
+
+  tree atype = build_array_type_nelts (orig_rettype,
+				       node->simdclone->simdlen);
+  if (veclen != node->simdclone->simdlen)
+    return build1 (VIEW_CONVERT_EXPR, atype, t);
+
+  /* Set up a SIMD array to use as the return value.  */
+  tree retval = create_tmp_var_raw (atype, "retval");
+  gimple_add_tmp_var (retval);
+  return retval;
+}
+
+/* Each vector argument has a corresponding array to be used locally
+   as part of the eventual loop.  Create such temporary array and
+   return it.
+
+   PREFIX is the prefix to be used for the temporary.
+
+   TYPE is the inner element type.
+
+   SIMDLEN is the number of elements.  */
+
+static tree
+create_tmp_simd_array (const char *prefix, tree type, int simdlen)
+{
+  tree atype = build_array_type_nelts (type, simdlen);
+  tree avar = create_tmp_var_raw (atype, prefix);
+  gimple_add_tmp_var (avar);
+  return avar;
+}
+
+/* Modify the function argument types to their corresponding vector
+   counterparts if appropriate.  Also, create one array for each simd
+   argument to be used locally when using the function arguments as
+   part of the loop.
+
+   NODE is the function whose arguments are to be adjusted.
+
+   Returns an adjustment vector that will be filled describing how the
+   argument types will be adjusted.  */
+
+static ipa_parm_adjustment_vec
+simd_clone_adjust_argument_types (struct cgraph_node *node)
+{
+  vec<tree> args;
+  ipa_parm_adjustment_vec adjustments;
+
+  if (node->definition)
+    args = ipa_get_vector_of_formal_parms (node->decl);
+  else
+    args = simd_clone_vector_of_formal_parm_types (node->decl);
+  adjustments.create (args.length ());
+  unsigned i, j, veclen;
+  struct ipa_parm_adjustment adj;
+  for (i = 0; i < node->simdclone->nargs; ++i)
+    {
+      memset (&adj, 0, sizeof (adj));
+      tree parm = args[i];
+      tree parm_type = node->definition ? TREE_TYPE (parm) : parm;
+      adj.base_index = i;
+      adj.base = parm;
+
+      node->simdclone->args[i].orig_arg = node->definition ? parm : NULL_TREE;
+      node->simdclone->args[i].orig_type = parm_type;
+
+      if (node->simdclone->args[i].arg_type != SIMD_CLONE_ARG_TYPE_VECTOR)
+	{
+	  /* No adjustment necessary for scalar arguments.  */
+	  adj.op = IPA_PARM_OP_COPY;
+	}
+      else
+	{
+	  if (INTEGRAL_TYPE_P (parm_type) || POINTER_TYPE_P (parm_type))
+	    veclen = node->simdclone->vecsize_int;
+	  else
+	    veclen = node->simdclone->vecsize_float;
+	  veclen /= GET_MODE_BITSIZE (TYPE_MODE (parm_type));
+	  if (veclen > node->simdclone->simdlen)
+	    veclen = node->simdclone->simdlen;
+	  adj.simdlen = veclen;
+	  adj.arg_prefix = "simd";
+	  if (POINTER_TYPE_P (parm_type))
+	    adj.by_ref = 1;
+	  adj.type = parm_type;
+	  node->simdclone->args[i].vector_type
+	    = build_vector_type (parm_type, veclen);
+	  for (j = veclen; j < node->simdclone->simdlen; j += veclen)
+	    {
+	      adjustments.safe_push (adj);
+	      if (j == veclen)
+		{
+		  memset (&adj, 0, sizeof (adj));
+		  adj.op = IPA_PARM_OP_NEW;
+		  adj.arg_prefix = "simd";
+		  adj.base_index = i;
+		  adj.type = node->simdclone->args[i].vector_type;
+		}
+	    }
+
+	  if (node->definition)
+	    node->simdclone->args[i].simd_array
+	      = create_tmp_simd_array (IDENTIFIER_POINTER (DECL_NAME (parm)),
+				       parm_type, node->simdclone->simdlen);
+	}
+      adjustments.safe_push (adj);
+    }
+
+  if (node->simdclone->inbranch)
+    {
+      tree base_type
+	= simd_clone_compute_base_data_type (node->simdclone->origin,
+					     node->simdclone);
+
+      memset (&adj, 0, sizeof (adj));
+      adj.op = IPA_PARM_OP_NEW;
+      adj.arg_prefix = "mask";
+
+      adj.base_index = i;
+      if (INTEGRAL_TYPE_P (base_type) || POINTER_TYPE_P (base_type))
+	veclen = node->simdclone->vecsize_int;
+      else
+	veclen = node->simdclone->vecsize_float;
+      veclen /= GET_MODE_BITSIZE (TYPE_MODE (base_type));
+      if (veclen > node->simdclone->simdlen)
+	veclen = node->simdclone->simdlen;
+      adj.type = build_vector_type (base_type, veclen);
+      adjustments.safe_push (adj);
+
+      for (j = veclen; j < node->simdclone->simdlen; j += veclen)
+	adjustments.safe_push (adj);
+
+      /* We have previously allocated one extra entry for the mask.  Use
+	 it and fill it.  */
+      struct cgraph_simd_clone *sc = node->simdclone;
+      sc->nargs++;
+      if (node->definition)
+	{
+	  sc->args[i].orig_arg
+	    = build_decl (UNKNOWN_LOCATION, PARM_DECL, NULL, base_type);
+	  sc->args[i].simd_array
+	    = create_tmp_simd_array ("mask", base_type, sc->simdlen);
+	}
+      sc->args[i].orig_type = base_type;
+      sc->args[i].arg_type = SIMD_CLONE_ARG_TYPE_MASK;
+    }
+
+  if (node->definition)
+    ipa_modify_formal_parameters (node->decl, adjustments);
+  else
+    {
+      tree new_arg_types = NULL_TREE, new_reversed;
+      bool last_parm_void = false;
+      if (args.length () > 0 && args.last () == void_type_node)
+	last_parm_void = true;
+
+      gcc_assert (TYPE_ARG_TYPES (TREE_TYPE (node->decl)));
+      j = adjustments.length ();
+      for (i = 0; i < j; i++)
+	{
+	  struct ipa_parm_adjustment *adj = &adjustments[i];
+	  tree ptype;
+	  if (adj->op == IPA_PARM_OP_COPY)
+	    ptype = args[adj->base_index];
+	  else if (adj->simdlen)
+	    ptype = build_vector_type (adj->type, adj->simdlen);
+	  else
+	    ptype = adj->type;
+	  new_arg_types = tree_cons (NULL_TREE, ptype, new_arg_types);
+	}
+      new_reversed = nreverse (new_arg_types);
+      if (last_parm_void)
+	{
+	  if (new_reversed)
+	    TREE_CHAIN (new_arg_types) = void_list_node;
+	  else
+	    new_reversed = void_list_node;
+	}
+
+      tree new_type = build_distinct_type_copy (TREE_TYPE (node->decl));
+      TYPE_ARG_TYPES (new_type) = new_reversed;
+      TREE_TYPE (node->decl) = new_type;
+
+      adjustments.release ();
+    }
+  args.release ();
+  return adjustments;
+}
+
+/* Initialize and copy the function arguments in NODE to their
+   corresponding local simd arrays.  Returns a fresh gimple_seq with
+   the instruction sequence generated.  */
+
+static gimple_seq
+simd_clone_init_simd_arrays (struct cgraph_node *node,
+			     ipa_parm_adjustment_vec adjustments)
+{
+  gimple_seq seq = NULL;
+  unsigned i = 0, j = 0, k;
+
+  for (tree arg = DECL_ARGUMENTS (node->decl);
+       arg;
+       arg = DECL_CHAIN (arg), i++, j++)
+    {
+      if (adjustments[j].op == IPA_PARM_OP_COPY)
+	continue;
+
+      node->simdclone->args[i].vector_arg = arg;
+
+      tree array = node->simdclone->args[i].simd_array;
+      if ((unsigned) adjustments[j].simdlen == node->simdclone->simdlen)
+	{
+	  tree ptype = build_pointer_type (TREE_TYPE (TREE_TYPE (array)));
+	  tree ptr = build_fold_addr_expr (array);
+	  tree t = build2 (MEM_REF, TREE_TYPE (arg), ptr,
+			   build_int_cst (ptype, 0));
+	  t = build2 (MODIFY_EXPR, TREE_TYPE (t), t, arg);
+	  gimplify_and_add (t, &seq);
+	}
+      else
+	{
+	  unsigned int simdlen = adjustments[j].simdlen;
+	  if (node->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_MASK)
+	    simdlen = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg));
+	  tree ptype = build_pointer_type (TREE_TYPE (TREE_TYPE (array)));
+	  for (k = 0; k < node->simdclone->simdlen; k += simdlen)
+	    {
+	      tree ptr = build_fold_addr_expr (array);
+	      int elemsize;
+	      if (k)
+		{
+		  arg = DECL_CHAIN (arg);
+		  j++;
+		}
+	      elemsize
+		= GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (TREE_TYPE (arg))));
+	      tree t = build2 (MEM_REF, TREE_TYPE (arg), ptr,
+			       build_int_cst (ptype, k * elemsize));
+	      t = build2 (MODIFY_EXPR, TREE_TYPE (t), t, arg);
+	      gimplify_and_add (t, &seq);
+	    }
+	}
+    }
+  return seq;
+}
+
+/* Callback info for ipa_simd_modify_stmt_ops below.  */
+
+struct modify_stmt_info {
+  ipa_parm_adjustment_vec adjustments;
+  gimple stmt;
+  /* True if the parent statement was modified by
+     ipa_simd_modify_stmt_ops.  */
+  bool modified;
+};
+
+/* Callback for walk_gimple_op.
+
+   Adjust operands from a given statement as specified in the
+   adjustments vector in the callback data.  */
+
+static tree
+ipa_simd_modify_stmt_ops (tree *tp, int *walk_subtrees, void *data)
+{
+  struct walk_stmt_info *wi = (struct walk_stmt_info *) data;
+  if (!SSA_VAR_P (*tp))
+    {
+      /* Make sure we treat subtrees as a RHS.  This makes sure that
+	 when examining the `*foo' in *foo=x, the `foo' get treated as
+	 a use properly.  */
+      wi->is_lhs = false;
+      wi->val_only = true;
+      if (TYPE_P (*tp))
+	*walk_subtrees = 0;
+      return NULL_TREE;
+    }
+  struct modify_stmt_info *info = (struct modify_stmt_info *) wi->info;
+  struct ipa_parm_adjustment *cand
+    = ipa_get_adjustment_candidate (&tp, NULL, info->adjustments, true);
+  if (!cand)
+    return NULL_TREE;
+
+  tree t = *tp;
+  tree repl = make_ssa_name (TREE_TYPE (t), NULL);
+
+  gimple stmt;
+  gimple_stmt_iterator gsi = gsi_for_stmt (info->stmt);
+  if (wi->is_lhs)
+    {
+      stmt = gimple_build_assign (unshare_expr (cand->new_decl), repl);
+      gsi_insert_after (&gsi, stmt, GSI_SAME_STMT);
+      SSA_NAME_DEF_STMT (repl) = info->stmt;
+    }
+  else
+    {
+      /* You'd think we could skip the extra SSA variable when
+	 wi->val_only=true, but we may have `*var' which will get
+	 replaced into `*var_array[iter]' and will likely be something
+	 not gimple.  */
+      stmt = gimple_build_assign (repl, unshare_expr (cand->new_decl));
+      gsi_insert_before (&gsi, stmt, GSI_SAME_STMT);
+    }
+
+  if (!useless_type_conversion_p (TREE_TYPE (*tp), TREE_TYPE (repl)))
+    {
+      tree vce = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (*tp), repl);
+      *tp = vce;
+    }
+  else
+    *tp = repl;
+
+  info->modified = true;
+  wi->is_lhs = false;
+  wi->val_only = true;
+  return NULL_TREE;
+}
+
+/* Traverse the function body and perform all modifications as
+   described in ADJUSTMENTS.  At function return, ADJUSTMENTS will be
+   modified such that the replacement/reduction value will now be an
+   offset into the corresponding simd_array.
+
+   This function will replace all function argument uses with their
+   corresponding simd array elements, and ajust the return values
+   accordingly.  */
+
+static void
+ipa_simd_modify_function_body (struct cgraph_node *node,
+			       ipa_parm_adjustment_vec adjustments,
+			       tree retval_array, tree iter)
+{
+  basic_block bb;
+  unsigned int i, j;
+
+  /* Re-use the adjustments array, but this time use it to replace
+     every function argument use to an offset into the corresponding
+     simd_array.  */
+  for (i = 0, j = 0; i < node->simdclone->nargs; ++i, ++j)
+    {
+      if (!node->simdclone->args[i].vector_arg)
+	continue;
+
+      tree basetype = TREE_TYPE (node->simdclone->args[i].orig_arg);
+      adjustments[j].new_decl
+	= build4 (ARRAY_REF,
+		  basetype,
+		  node->simdclone->args[i].simd_array,
+		  iter,
+		  NULL_TREE, NULL_TREE);
+      if (adjustments[j].op == IPA_PARM_OP_NONE
+	  && (unsigned) adjustments[j].simdlen < node->simdclone->simdlen)
+	j += node->simdclone->simdlen / adjustments[j].simdlen - 1;
+    }
+
+  struct modify_stmt_info info;
+  info.adjustments = adjustments;
+
+  FOR_EACH_BB_FN (bb, DECL_STRUCT_FUNCTION (node->decl))
+    {
+      gimple_stmt_iterator gsi;
+
+      gsi = gsi_start_bb (bb);
+      while (!gsi_end_p (gsi))
+	{
+	  gimple stmt = gsi_stmt (gsi);
+	  info.stmt = stmt;
+	  struct walk_stmt_info wi;
+
+	  memset (&wi, 0, sizeof (wi));
+	  info.modified = false;
+	  wi.info = &info;
+	  walk_gimple_op (stmt, ipa_simd_modify_stmt_ops, &wi);
+
+	  if (gimple_code (stmt) == GIMPLE_RETURN)
+	    {
+	      tree retval = gimple_return_retval (stmt);
+	      if (!retval)
+		{
+		  gsi_remove (&gsi, true);
+		  continue;
+		}
+
+	      /* Replace `return foo' with `retval_array[iter] = foo'.  */
+	      tree ref = build4 (ARRAY_REF, TREE_TYPE (retval),
+				 retval_array, iter, NULL, NULL);
+	      stmt = gimple_build_assign (ref, retval);
+	      gsi_replace (&gsi, stmt, true);
+	      info.modified = true;
+	    }
+
+	  if (info.modified)
+	    {
+	      update_stmt (stmt);
+	      if (maybe_clean_eh_stmt (stmt))
+		gimple_purge_dead_eh_edges (gimple_bb (stmt));
+	    }
+	  gsi_next (&gsi);
+	}
+    }
+}
+
+/* Adjust the argument types in NODE to their appropriate vector
+   counterparts.  */
+
+static void
+simd_clone_adjust (struct cgraph_node *node)
+{
+  push_cfun (DECL_STRUCT_FUNCTION (node->decl));
+
+  targetm.simd_clone.adjust (node);
+
+  tree retval = simd_clone_adjust_return_type (node);
+  ipa_parm_adjustment_vec adjustments
+    = simd_clone_adjust_argument_types (node);
+
+  push_gimplify_context ();
+
+  gimple_seq seq = simd_clone_init_simd_arrays (node, adjustments);
+
+  /* Adjust all uses of vector arguments accordingly.  Adjust all
+     return values accordingly.  */
+  tree iter = create_tmp_var (unsigned_type_node, "iter");
+  tree iter1 = make_ssa_name (iter, NULL);
+  tree iter2 = make_ssa_name (iter, NULL);
+  ipa_simd_modify_function_body (node, adjustments, retval, iter1);
+
+  /* Initialize the iteration variable.  */
+  basic_block entry_bb = single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun));
+  basic_block body_bb = split_block_after_labels (entry_bb)->dest;
+  gimple_stmt_iterator gsi = gsi_after_labels (entry_bb);
+  /* Insert the SIMD array and iv initialization at function
+     entry.  */
+  gsi_insert_seq_before (&gsi, seq, GSI_NEW_STMT);
+
+  pop_gimplify_context (NULL);
+
+  /* Create a new BB right before the original exit BB, to hold the
+     iteration increment and the condition/branch.  */
+  basic_block orig_exit = EDGE_PRED (EXIT_BLOCK_PTR_FOR_FN (cfun), 0)->src;
+  basic_block incr_bb = create_empty_bb (orig_exit);
+  /* The succ of orig_exit was EXIT_BLOCK_PTR_FOR_FN (cfun), with an empty
+     flag.  Set it now to be a FALLTHRU_EDGE.  */
+  gcc_assert (EDGE_COUNT (orig_exit->succs) == 1);
+  EDGE_SUCC (orig_exit, 0)->flags |= EDGE_FALLTHRU;
+  for (unsigned i = 0;
+       i < EDGE_COUNT (EXIT_BLOCK_PTR_FOR_FN (cfun)->preds); ++i)
+    {
+      edge e = EDGE_PRED (EXIT_BLOCK_PTR_FOR_FN (cfun), i);
+      redirect_edge_succ (e, incr_bb);
+    }
+  edge e = make_edge (incr_bb, EXIT_BLOCK_PTR_FOR_FN (cfun), 0);
+  e->probability = REG_BR_PROB_BASE;
+  gsi = gsi_last_bb (incr_bb);
+  gimple g = gimple_build_assign_with_ops (PLUS_EXPR, iter2, iter1,
+					   build_int_cst (unsigned_type_node,
+							  1));
+  gsi_insert_after (&gsi, g, GSI_CONTINUE_LINKING);
+
+  /* Mostly annotate the loop for the vectorizer (the rest is done below).  */
+  struct loop *loop = alloc_loop ();
+  cfun->has_force_vect_loops = true;
+  loop->safelen = node->simdclone->simdlen;
+  loop->force_vect = true;
+  loop->header = body_bb;
+  add_bb_to_loop (incr_bb, loop);
+
+  /* Branch around the body if the mask applies.  */
+  if (node->simdclone->inbranch)
+    {
+      gimple_stmt_iterator gsi = gsi_last_bb (loop->header);
+      tree mask_array
+	= node->simdclone->args[node->simdclone->nargs - 1].simd_array;
+      tree mask = make_ssa_name (TREE_TYPE (TREE_TYPE (mask_array)), NULL);
+      tree aref = build4 (ARRAY_REF,
+			  TREE_TYPE (TREE_TYPE (mask_array)),
+			  mask_array, iter1,
+			  NULL, NULL);
+      g = gimple_build_assign (mask, aref);
+      gsi_insert_after (&gsi, g, GSI_CONTINUE_LINKING);
+      int bitsize = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (aref)));
+      if (!INTEGRAL_TYPE_P (TREE_TYPE (aref)))
+	{
+	  aref = build1 (VIEW_CONVERT_EXPR,
+			 build_nonstandard_integer_type (bitsize, 0), mask);
+	  mask = make_ssa_name (TREE_TYPE (aref), NULL);
+	  g = gimple_build_assign (mask, aref);
+	  gsi_insert_after (&gsi, g, GSI_CONTINUE_LINKING);
+	}
+
+      g = gimple_build_cond (EQ_EXPR, mask, build_zero_cst (TREE_TYPE (mask)),
+			     NULL, NULL);
+      gsi_insert_after (&gsi, g, GSI_CONTINUE_LINKING);
+      make_edge (loop->header, incr_bb, EDGE_TRUE_VALUE);
+      FALLTHRU_EDGE (loop->header)->flags = EDGE_FALSE_VALUE;
+    }
+
+  /* Generate the condition.  */
+  g = gimple_build_cond (LT_EXPR,
+			 iter2,
+			 build_int_cst (unsigned_type_node,
+					node->simdclone->simdlen),
+			 NULL, NULL);
+  gsi_insert_after (&gsi, g, GSI_CONTINUE_LINKING);
+  e = split_block (incr_bb, gsi_stmt (gsi));
+  basic_block latch_bb = e->dest;
+  basic_block new_exit_bb = e->dest;
+  new_exit_bb = split_block (latch_bb, NULL)->dest;
+  loop->latch = latch_bb;
+
+  redirect_edge_succ (FALLTHRU_EDGE (latch_bb), body_bb);
+
+  make_edge (incr_bb, new_exit_bb, EDGE_FALSE_VALUE);
+  /* The successor of incr_bb is already pointing to latch_bb; just
+     change the flags.
+     make_edge (incr_bb, latch_bb, EDGE_TRUE_VALUE);  */
+  FALLTHRU_EDGE (incr_bb)->flags = EDGE_TRUE_VALUE;
+
+  gimple phi = create_phi_node (iter1, body_bb);
+  edge preheader_edge = find_edge (entry_bb, body_bb);
+  edge latch_edge = single_succ_edge (latch_bb);
+  add_phi_arg (phi, build_zero_cst (unsigned_type_node), preheader_edge,
+	       UNKNOWN_LOCATION);
+  add_phi_arg (phi, iter2, latch_edge, UNKNOWN_LOCATION);
+
+  /* Generate the new return.  */
+  gsi = gsi_last_bb (new_exit_bb);
+  if (retval
+      && TREE_CODE (retval) == VIEW_CONVERT_EXPR
+      && TREE_CODE (TREE_OPERAND (retval, 0)) == RESULT_DECL)
+    retval = TREE_OPERAND (retval, 0);
+  else if (retval)
+    {
+      retval = build1 (VIEW_CONVERT_EXPR,
+		       TREE_TYPE (TREE_TYPE (node->decl)),
+		       retval);
+      retval = force_gimple_operand_gsi (&gsi, retval, true, NULL,
+					 false, GSI_CONTINUE_LINKING);
+    }
+  g = gimple_build_return (retval);
+  gsi_insert_after (&gsi, g, GSI_CONTINUE_LINKING);
+
+  /* Handle aligned clauses by replacing default defs of the aligned
+     uniform args with __builtin_assume_aligned (arg_N(D), alignment)
+     lhs.  Handle linear by adding PHIs.  */
+  for (unsigned i = 0; i < node->simdclone->nargs; i++)
+    if (node->simdclone->args[i].alignment
+	&& node->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_UNIFORM
+	&& (node->simdclone->args[i].alignment
+	    & (node->simdclone->args[i].alignment - 1)) == 0
+	&& TREE_CODE (TREE_TYPE (node->simdclone->args[i].orig_arg))
+	   == POINTER_TYPE)
+      {
+	unsigned int alignment = node->simdclone->args[i].alignment;
+	tree orig_arg = node->simdclone->args[i].orig_arg;
+	tree def = ssa_default_def (cfun, orig_arg);
+	if (!has_zero_uses (def))
+	  {
+	    tree fn = builtin_decl_explicit (BUILT_IN_ASSUME_ALIGNED);
+	    gimple_seq seq = NULL;
+	    bool need_cvt = false;
+	    gimple call
+	      = gimple_build_call (fn, 2, def, size_int (alignment));
+	    g = call;
+	    if (!useless_type_conversion_p (TREE_TYPE (orig_arg),
+					    ptr_type_node))
+	      need_cvt = true;
+	    tree t = make_ssa_name (need_cvt ? ptr_type_node : orig_arg, NULL);
+	    gimple_call_set_lhs (g, t);
+	    gimple_seq_add_stmt_without_update (&seq, g);
+	    if (need_cvt)
+	      {
+		t = make_ssa_name (orig_arg, NULL);
+		g = gimple_build_assign_with_ops (NOP_EXPR, t,
+						  gimple_call_lhs (g),
+						  NULL_TREE);
+		gimple_seq_add_stmt_without_update (&seq, g);
+	      }
+	    gsi_insert_seq_on_edge_immediate
+	      (single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun)), seq);
+
+	    entry_bb = single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun));
+	    int freq = compute_call_stmt_bb_frequency (current_function_decl,
+						       entry_bb);
+	    cgraph_create_edge (node, cgraph_get_create_node (fn),
+				call, entry_bb->count, freq);
+
+	    imm_use_iterator iter;
+	    use_operand_p use_p;
+	    gimple use_stmt;
+	    tree repl = gimple_get_lhs (g);
+	    FOR_EACH_IMM_USE_STMT (use_stmt, iter, def)
+	      if (is_gimple_debug (use_stmt) || use_stmt == call)
+		continue;
+	      else
+		FOR_EACH_IMM_USE_ON_STMT (use_p, iter)
+		  SET_USE (use_p, repl);
+	  }
+      }
+    else if (node->simdclone->args[i].arg_type
+	     == SIMD_CLONE_ARG_TYPE_LINEAR_CONSTANT_STEP)
+      {
+	tree orig_arg = node->simdclone->args[i].orig_arg;
+	tree def = ssa_default_def (cfun, orig_arg);
+	gcc_assert (INTEGRAL_TYPE_P (TREE_TYPE (orig_arg))
+		    || POINTER_TYPE_P (TREE_TYPE (orig_arg)));
+	if (!has_zero_uses (def))
+	  {
+	    iter1 = make_ssa_name (orig_arg, NULL);
+	    iter2 = make_ssa_name (orig_arg, NULL);
+	    phi = create_phi_node (iter1, body_bb);
+	    add_phi_arg (phi, def, preheader_edge, UNKNOWN_LOCATION);
+	    add_phi_arg (phi, iter2, latch_edge, UNKNOWN_LOCATION);
+	    enum tree_code code = INTEGRAL_TYPE_P (TREE_TYPE (orig_arg))
+				  ? PLUS_EXPR : POINTER_PLUS_EXPR;
+	    tree addtype = INTEGRAL_TYPE_P (TREE_TYPE (orig_arg))
+			   ? TREE_TYPE (orig_arg) : sizetype;
+	    tree addcst
+	      = build_int_cst (addtype, node->simdclone->args[i].linear_step);
+	    g = gimple_build_assign_with_ops (code, iter2, iter1, addcst);
+	    gsi = gsi_last_bb (incr_bb);
+	    gsi_insert_before (&gsi, g, GSI_SAME_STMT);
+
+	    imm_use_iterator iter;
+	    use_operand_p use_p;
+	    gimple use_stmt;
+	    FOR_EACH_IMM_USE_STMT (use_stmt, iter, def)
+	      if (use_stmt == phi)
+		continue;
+	      else
+		FOR_EACH_IMM_USE_ON_STMT (use_p, iter)
+		  SET_USE (use_p, iter1);
+	  }
+      }
+
+  calculate_dominance_info (CDI_DOMINATORS);
+  add_loop (loop, loop->header->loop_father);
+  update_ssa (TODO_update_ssa);
+
+  pop_cfun ();
+}
+
+/* If the function in NODE is tagged as an elemental SIMD function,
+   create the appropriate SIMD clones.  */
+
+static void
+expand_simd_clones (struct cgraph_node *node)
+{
+  if (lookup_attribute ("noclone", DECL_ATTRIBUTES (node->decl)))
+    return;
+
+  tree attr = lookup_attribute ("omp declare simd",
+				DECL_ATTRIBUTES (node->decl));
+  if (!attr || targetm.simd_clone.compute_vecsize_and_simdlen == NULL)
+    return;
+  /* Ignore
+     #pragma omp declare simd
+     extern int foo ();
+     in C, there we don't know the argument types at all.  */
+  if (!node->definition
+      && TYPE_ARG_TYPES (TREE_TYPE (node->decl)) == NULL_TREE)
+    return;
+  do
+    {
+      bool inbranch_clause_specified;
+      struct cgraph_simd_clone *clone_info
+	= simd_clone_clauses_extract (node, TREE_VALUE (attr),
+				      &inbranch_clause_specified);
+      if (clone_info == NULL)
+	continue;
+
+      int orig_simdlen = clone_info->simdlen;
+      tree base_type = simd_clone_compute_base_data_type (node, clone_info);
+      int count
+	= targetm.simd_clone.compute_vecsize_and_simdlen (node, clone_info,
+							  base_type, 0);
+      if (count == 0)
+	continue;
+
+      for (int i = 0; i < count * 2; i++)
+	{
+	  struct cgraph_simd_clone *clone = clone_info;
+	  if (inbranch_clause_specified && (i & 1) != 0)
+	    continue;
+
+	  if (i != 0)
+	    {
+	      clone = simd_clone_struct_alloc (clone_info->nargs
+					       - clone_info->inbranch
+					       + ((i & 1) != 0));
+	      simd_clone_struct_copy (clone, clone_info);
+	      clone->nargs -= clone_info->inbranch;
+	      clone->simdlen = orig_simdlen;
+	      targetm.simd_clone.compute_vecsize_and_simdlen (node, clone,
+							      base_type,
+							      i / 2);
+	      if ((i & 1) != 0)
+		clone->inbranch = 1;
+	    }
+
+	  tree id = simd_clone_mangle (node, clone);
+	  if (id == NULL_TREE)
+	    continue;
+
+	  struct cgraph_node *n = simd_clone_create (node);
+	  if (n == NULL)
+	    continue;
+
+	  n->simdclone = clone;
+	  clone->origin = node;
+	  clone->next_clone = NULL;
+	  if (node->simd_clones == NULL)
+	    {
+	      clone->prev_clone = n;
+	      node->simd_clones = n;
+	    }
+	  else
+	    {
+	      clone->prev_clone = node->simd_clones->simdclone->prev_clone;
+	      clone->prev_clone->simdclone->next_clone = n;
+	      node->simd_clones->simdclone->prev_clone = n;
+	    }
+	  change_decl_assembler_name (n->decl, id);
+	  if (node->definition)
+	    simd_clone_adjust (n);
+	  else
+	    {
+	      simd_clone_adjust_return_type (n);
+	      simd_clone_adjust_argument_types (n);
+	    }
+	}
+    }
+  while ((attr = lookup_attribute ("omp declare simd", TREE_CHAIN (attr))));
+}
+
+/* Entry point for IPA simd clone creation pass.  */
+
+static unsigned int
+ipa_omp_simd_clone (void)
+{
+  struct cgraph_node *node;
+  FOR_EACH_FUNCTION (node)
+    expand_simd_clones (node);
+  return 0;
+}
+
+namespace {
+
+const pass_data pass_data_omp_simd_clone =
+{
+  SIMPLE_IPA_PASS,		/* type */
+  "simdclone",			/* name */
+  OPTGROUP_NONE,		/* optinfo_flags */
+  true,				/* has_gate */
+  true,				/* has_execute */
+  TV_NONE,			/* tv_id */
+  ( PROP_ssa | PROP_cfg ),	/* properties_required */
+  0,				/* properties_provided */
+  0,				/* properties_destroyed */
+  0,				/* todo_flags_start */
+  0,				/* todo_flags_finish */
+};
+
+class pass_omp_simd_clone : public simple_ipa_opt_pass
+{
+public:
+  pass_omp_simd_clone(gcc::context *ctxt)
+    : simple_ipa_opt_pass(pass_data_omp_simd_clone, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  bool gate () { return flag_openmp || flag_openmp_simd
+			|| flag_enable_cilkplus; }
+  unsigned int execute () { return ipa_omp_simd_clone (); }
+};
+
+} // anon namespace
+
+simple_ipa_opt_pass *
+make_pass_omp_simd_clone (gcc::context *ctxt)
+{
+  return new pass_omp_simd_clone (ctxt);
+}
 
 #include "gt-omp-low.h"
--- gcc/passes.def	(.../trunk)	(revision 205223)
+++ gcc/passes.def	(.../branches/gomp-4_0-branch)	(revision 205231)
@@ -97,6 +97,7 @@  along with GCC; see the file COPYING3.
       NEXT_PASS (pass_feedback_split_functions);
   POP_INSERT_PASSES ()
   NEXT_PASS (pass_ipa_increase_alignment);
+  NEXT_PASS (pass_omp_simd_clone);
   NEXT_PASS (pass_ipa_tm);
   NEXT_PASS (pass_ipa_lower_emutls);
   TERMINATE_PASS_LIST ()
--- gcc/target.def	(.../trunk)	(revision 205223)
+++ gcc/target.def	(.../branches/gomp-4_0-branch)	(revision 205231)
@@ -1521,6 +1521,36 @@  hook_int_uint_mode_1)
 
 HOOK_VECTOR_END (sched)
 
+/* Functions relating to OpenMP and Cilk Plus SIMD clones.  */
+#undef HOOK_PREFIX
+#define HOOK_PREFIX "TARGET_SIMD_CLONE_"
+HOOK_VECTOR (TARGET_SIMD_CLONE, simd_clone)
+
+DEFHOOK
+(compute_vecsize_and_simdlen,
+"This hook should set @var{vecsize_mangle}, @var{vecsize_int}, @var{vecsize_float}\n\
+fields in @var{simd_clone} structure pointed by @var{clone_info} argument and also\n\
+@var{simdlen} field if it was previously 0.\n\
+The hook should return 0 if SIMD clones shouldn't be emitted,\n\
+or number of @var{vecsize_mangle} variants that should be emitted.",
+int, (struct cgraph_node *, struct cgraph_simd_clone *, tree, int), NULL)
+
+DEFHOOK
+(adjust,
+"This hook should add implicit @code{attribute(target(\"...\"))} attribute\n\
+to SIMD clone @var{node} if needed.",
+void, (struct cgraph_node *), NULL)
+
+DEFHOOK
+(usable,
+"This hook should return -1 if SIMD clone @var{node} shouldn't be used\n\
+in vectorized loops in current function, or non-negative number if it is\n\
+usable.  In that case, the smaller the number is, the more desirable it is\n\
+to use it.",
+int, (struct cgraph_node *), NULL)
+
+HOOK_VECTOR_END (simd_clone)
+
 /* Functions relating to vectorization.  */
 #undef HOOK_PREFIX
 #define HOOK_PREFIX "TARGET_VECTORIZE_"
--- gcc/target.h	(.../trunk)	(revision 205223)
+++ gcc/target.h	(.../branches/gomp-4_0-branch)	(revision 205231)
@@ -93,6 +93,8 @@  extern bool target_default_pointer_addre
 struct stdarg_info;
 struct spec_info_def;
 struct hard_reg_set_container;
+struct cgraph_node;
+struct cgraph_simd_clone;
 
 /* The struct used by the secondary_reload target hook.  */
 typedef struct secondary_reload_info
--- gcc/tree-core.h	(.../trunk)	(revision 205223)
+++ gcc/tree-core.h	(.../branches/gomp-4_0-branch)	(revision 205231)
@@ -903,6 +903,9 @@  struct GTY(()) tree_base {
        CALL_ALLOCA_FOR_VAR_P in
            CALL_EXPR
 
+       OMP_CLAUSE_LINEAR_VARIABLE_STRIDE in
+	   OMP_CLAUSE_LINEAR
+
    side_effects_flag:
 
        TREE_SIDE_EFFECTS in
--- gcc/tree.h	(.../trunk)	(revision 205223)
+++ gcc/tree.h	(.../branches/gomp-4_0-branch)	(revision 205231)
@@ -1346,6 +1351,10 @@  extern void protected_set_expr_location
 #define OMP_CLAUSE_LINEAR_NO_COPYOUT(NODE) \
   TREE_PRIVATE (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_LINEAR))
 
+/* True if a LINEAR clause has a stride that is variable.  */
+#define OMP_CLAUSE_LINEAR_VARIABLE_STRIDE(NODE) \
+  TREE_PROTECTED (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_LINEAR))
+
 #define OMP_CLAUSE_LINEAR_STEP(NODE) \
   OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_LINEAR), 1)
 
--- gcc/tree-pass.h	(.../trunk)	(revision 205223)
+++ gcc/tree-pass.h	(.../branches/gomp-4_0-branch)	(revision 205231)
@@ -472,6 +472,7 @@  extern ipa_opt_pass_d *make_pass_ipa_ref
 extern ipa_opt_pass_d *make_pass_ipa_pure_const (gcc::context *ctxt);
 extern simple_ipa_opt_pass *make_pass_ipa_pta (gcc::context *ctxt);
 extern simple_ipa_opt_pass *make_pass_ipa_tm (gcc::context *ctxt);
+extern simple_ipa_opt_pass *make_pass_omp_simd_clone (gcc::context *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_profile (gcc::context *ctxt);
 extern ipa_opt_pass_d *make_pass_ipa_cdtor_merge (gcc::context *ctxt);
 
--- gcc/tree-sra.c	(.../trunk)	(revision 205223)
+++ gcc/tree-sra.c	(.../branches/gomp-4_0-branch)	(revision 205231)
@@ -4277,9 +4277,10 @@  turn_representatives_into_adjustments (v
 	  adj.base_index = get_param_index (parm, parms);
 	  adj.base = parm;
 	  if (!repr)
-	    adj.copy_param = 1;
+	    adj.op = IPA_PARM_OP_COPY;
 	  else
-	    adj.remove_param = 1;
+	    adj.op = IPA_PARM_OP_REMOVE;
+	  adj.arg_prefix = "ISRA";
 	  adjustments.quick_push (adj);
 	}
       else
@@ -4299,6 +4300,7 @@  turn_representatives_into_adjustments (v
 	      adj.by_ref = (POINTER_TYPE_P (TREE_TYPE (repr->base))
 			    && (repr->grp_maybe_modified
 				|| repr->grp_not_necessarilly_dereferenced));
+	      adj.arg_prefix = "ISRA";
 	      adjustments.quick_push (adj);
 	    }
 	}
@@ -4429,7 +4431,7 @@  get_adjustment_for_base (ipa_parm_adjust
       struct ipa_parm_adjustment *adj;
 
       adj = &adjustments[i];
-      if (!adj->copy_param && adj->base == base)
+      if (adj->op != IPA_PARM_OP_COPY && adj->base == base)
 	return adj;
     }
 
@@ -4493,84 +4495,6 @@  replace_removed_params_ssa_names (gimple
   return true;
 }
 
-/* If the expression *EXPR should be replaced by a reduction of a parameter, do
-   so.  ADJUSTMENTS is a pointer to a vector of adjustments.  CONVERT
-   specifies whether the function should care about type incompatibility the
-   current and new expressions.  If it is false, the function will leave
-   incompatibility issues to the caller.  Return true iff the expression
-   was modified. */
-
-static bool
-sra_ipa_modify_expr (tree *expr, bool convert,
-		     ipa_parm_adjustment_vec adjustments)
-{
-  int i, len;
-  struct ipa_parm_adjustment *adj, *cand = NULL;
-  HOST_WIDE_INT offset, size, max_size;
-  tree base, src;
-
-  len = adjustments.length ();
-
-  if (TREE_CODE (*expr) == BIT_FIELD_REF
-      || TREE_CODE (*expr) == IMAGPART_EXPR
-      || TREE_CODE (*expr) == REALPART_EXPR)
-    {
-      expr = &TREE_OPERAND (*expr, 0);
-      convert = true;
-    }
-
-  base = get_ref_base_and_extent (*expr, &offset, &size, &max_size);
-  if (!base || size == -1 || max_size == -1)
-    return false;
-
-  if (TREE_CODE (base) == MEM_REF)
-    {
-      offset += mem_ref_offset (base).low * BITS_PER_UNIT;
-      base = TREE_OPERAND (base, 0);
-    }
-
-  base = get_ssa_base_param (base);
-  if (!base || TREE_CODE (base) != PARM_DECL)
-    return false;
-
-  for (i = 0; i < len; i++)
-    {
-      adj = &adjustments[i];
-
-      if (adj->base == base
-	  && (adj->offset == offset || adj->remove_param))
-	{
-	  cand = adj;
-	  break;
-	}
-    }
-  if (!cand || cand->copy_param || cand->remove_param)
-    return false;
-
-  if (cand->by_ref)
-    src = build_simple_mem_ref (cand->reduction);
-  else
-    src = cand->reduction;
-
-  if (dump_file && (dump_flags & TDF_DETAILS))
-    {
-      fprintf (dump_file, "About to replace expr ");
-      print_generic_expr (dump_file, *expr, 0);
-      fprintf (dump_file, " with ");
-      print_generic_expr (dump_file, src, 0);
-      fprintf (dump_file, "\n");
-    }
-
-  if (convert && !useless_type_conversion_p (TREE_TYPE (*expr), cand->type))
-    {
-      tree vce = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (*expr), src);
-      *expr = vce;
-    }
-  else
-    *expr = src;
-  return true;
-}
-
 /* If the statement pointed to by STMT_PTR contains any expressions that need
    to replaced with a different one as noted by ADJUSTMENTS, do so.  Handle any
    potential type incompatibilities (GSI is used to accommodate conversion
@@ -4591,8 +4515,8 @@  sra_ipa_modify_assign (gimple *stmt_ptr,
   rhs_p = gimple_assign_rhs1_ptr (stmt);
   lhs_p = gimple_assign_lhs_ptr (stmt);
 
-  any = sra_ipa_modify_expr (rhs_p, false, adjustments);
-  any |= sra_ipa_modify_expr (lhs_p, false, adjustments);
+  any = ipa_modify_expr (rhs_p, false, adjustments);
+  any |= ipa_modify_expr (lhs_p, false, adjustments);
   if (any)
     {
       tree new_rhs = NULL_TREE;
@@ -4638,7 +4562,7 @@  sra_ipa_modify_assign (gimple *stmt_ptr,
 /* Traverse the function body and all modifications as described in
    ADJUSTMENTS.  Return true iff the CFG has been changed.  */
 
-static bool
+bool
 ipa_sra_modify_function_body (ipa_parm_adjustment_vec adjustments)
 {
   bool cfg_changed = false;
@@ -4664,7 +4588,7 @@  ipa_sra_modify_function_body (ipa_parm_a
 	    case GIMPLE_RETURN:
 	      t = gimple_return_retval_ptr (stmt);
 	      if (*t != NULL_TREE)
-		modified |= sra_ipa_modify_expr (t, true, adjustments);
+		modified |= ipa_modify_expr (t, true, adjustments);
 	      break;
 
 	    case GIMPLE_ASSIGN:
@@ -4677,13 +4601,13 @@  ipa_sra_modify_function_body (ipa_parm_a
 	      for (i = 0; i < gimple_call_num_args (stmt); i++)
 		{
 		  t = gimple_call_arg_ptr (stmt, i);
-		  modified |= sra_ipa_modify_expr (t, true, adjustments);
+		  modified |= ipa_modify_expr (t, true, adjustments);
 		}
 
 	      if (gimple_call_lhs (stmt))
 		{
 		  t = gimple_call_lhs_ptr (stmt);
-		  modified |= sra_ipa_modify_expr (t, false, adjustments);
+		  modified |= ipa_modify_expr (t, false, adjustments);
 		  modified |= replace_removed_params_ssa_names (stmt,
 								adjustments);
 		}
@@ -4693,12 +4617,12 @@  ipa_sra_modify_function_body (ipa_parm_a
 	      for (i = 0; i < gimple_asm_ninputs (stmt); i++)
 		{
 		  t = &TREE_VALUE (gimple_asm_input_op (stmt, i));
-		  modified |= sra_ipa_modify_expr (t, true, adjustments);
+		  modified |= ipa_modify_expr (t, true, adjustments);
 		}
 	      for (i = 0; i < gimple_asm_noutputs (stmt); i++)
 		{
 		  t = &TREE_VALUE (gimple_asm_output_op (stmt, i));
-		  modified |= sra_ipa_modify_expr (t, false, adjustments);
+		  modified |= ipa_modify_expr (t, false, adjustments);
 		}
 	      break;
 
@@ -4744,7 +4668,7 @@  sra_ipa_reset_debug_stmts (ipa_parm_adju
       use_operand_p use_p;
 
       adj = &adjustments[i];
-      if (adj->copy_param || !is_gimple_reg (adj->base))
+      if (adj->op == IPA_PARM_OP_COPY || !is_gimple_reg (adj->base))
 	continue;
       name = ssa_default_def (cfun, adj->base);
       vexpr = NULL;
@@ -4927,7 +4851,7 @@  modify_function (struct cgraph_node *nod
   redirect_callers.release ();
 
   push_cfun (DECL_STRUCT_FUNCTION (new_node->decl));
-  ipa_modify_formal_parameters (current_function_decl, adjustments, "ISRA");
+  ipa_modify_formal_parameters (current_function_decl, adjustments);
   cfg_changed = ipa_sra_modify_function_body (adjustments);
   sra_ipa_reset_debug_stmts (adjustments);
   convert_callers (new_node, node->decl, adjustments);
--- gcc/tree-vect-data-refs.c	(.../trunk)	(revision 205223)
+++ gcc/tree-vect-data-refs.c	(.../branches/gomp-4_0-branch)	(revision 205231)
@@ -49,6 +49,7 @@  along with GCC; see the file COPYING3.
 #include "tree-scalar-evolution.h"
 #include "tree-vectorizer.h"
 #include "diagnostic-core.h"
+#include "cgraph.h"
 /* Need to include rtl.h, expr.h, etc. for optabs.  */
 #include "expr.h"
 #include "optabs.h"
@@ -3163,10 +3164,11 @@  vect_analyze_data_refs (loop_vec_info lo
 
   if (loop_vinfo)
     {
+      basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
+
       loop = LOOP_VINFO_LOOP (loop_vinfo);
-      if (!find_loop_nest (loop, &LOOP_VINFO_LOOP_NEST (loop_vinfo))
-	  || find_data_references_in_loop
-	       (loop, &LOOP_VINFO_DATAREFS (loop_vinfo)))
+      datarefs = LOOP_VINFO_DATAREFS (loop_vinfo);
+      if (!find_loop_nest (loop, &LOOP_VINFO_LOOP_NEST (loop_vinfo)))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -3175,7 +3177,57 @@  vect_analyze_data_refs (loop_vec_info lo
 	  return false;
 	}
 
-      datarefs = LOOP_VINFO_DATAREFS (loop_vinfo);
+      for (i = 0; i < loop->num_nodes; i++)
+	{
+	  gimple_stmt_iterator gsi;
+
+	  for (gsi = gsi_start_bb (bbs[i]); !gsi_end_p (gsi); gsi_next (&gsi))
+	    {
+	      gimple stmt = gsi_stmt (gsi);
+	      if (!find_data_references_in_stmt (loop, stmt, &datarefs))
+		{
+		  if (is_gimple_call (stmt) && loop->safelen)
+		    {
+		      tree fndecl = gimple_call_fndecl (stmt), op;
+		      if (fndecl != NULL_TREE)
+			{
+			  struct cgraph_node *node = cgraph_get_node (fndecl);
+			  if (node != NULL && node->simd_clones != NULL)
+			    {
+			      unsigned int j, n = gimple_call_num_args (stmt);
+			      for (j = 0; j < n; j++)
+				{
+				  op = gimple_call_arg (stmt, j);
+				  if (DECL_P (op)
+				      || (REFERENCE_CLASS_P (op)
+					  && get_base_address (op)))
+				    break;
+				}
+			      op = gimple_call_lhs (stmt);
+			      /* Ignore #pragma omp declare simd functions
+				 if they don't have data references in the
+				 call stmt itself.  */
+			      if (j == n
+				  && !(op
+				       && (DECL_P (op)
+					   || (REFERENCE_CLASS_P (op)
+					       && get_base_address (op)))))
+				continue;
+			    }
+			}
+		    }
+		  LOOP_VINFO_DATAREFS (loop_vinfo) = datarefs;
+		  if (dump_enabled_p ())
+		    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				     "not vectorized: loop contains function "
+				     "calls or data references that cannot "
+				     "be analyzed\n");
+		  return false;
+		}
+	    }
+	}
+
+      LOOP_VINFO_DATAREFS (loop_vinfo) = datarefs;
     }
   else
     {
--- gcc/tree-vect-loop.c	(.../trunk)	(revision 205223)
+++ gcc/tree-vect-loop.c	(.../branches/gomp-4_0-branch)	(revision 205231)
@@ -373,6 +373,36 @@  vect_determine_vectorization_factor (loo
 
 	  if (gimple_get_lhs (stmt) == NULL_TREE)
 	    {
+	      if (is_gimple_call (stmt))
+		{
+		  /* Ignore calls with no lhs.  These must be calls to
+		     #pragma omp simd functions, and what vectorization factor
+		     it really needs can't be determined until
+		     vectorizable_simd_clone_call.  */
+		  if (STMT_VINFO_VECTYPE (stmt_info) == NULL_TREE)
+		    {
+		      unsigned int j, n = gimple_call_num_args (stmt);
+		      for (j = 0; j < n; j++)
+			{
+			  scalar_type = TREE_TYPE (gimple_call_arg (stmt, j));
+			  vectype = get_vectype_for_scalar_type (scalar_type);
+			  if (vectype)
+			    {
+			      STMT_VINFO_VECTYPE (stmt_info) = vectype;
+			      break;
+			    }
+			}
+		    }
+		  if (STMT_VINFO_VECTYPE (stmt_info) != NULL_TREE)
+		    {
+		      if (!analyze_pattern_stmt && gsi_end_p (pattern_def_si))
+			{
+			  pattern_def_seq = NULL;
+			  gsi_next (&si);
+			}
+		      continue;
+		    }
+		}
 	      if (dump_enabled_p ())
 		{
 	          dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
--- gcc/tree-vectorizer.h	(.../trunk)	(revision 205223)
+++ gcc/tree-vectorizer.h	(.../branches/gomp-4_0-branch)	(revision 205231)
@@ -443,6 +443,7 @@  enum stmt_vec_info_type {
   shift_vec_info_type,
   op_vec_info_type,
   call_vec_info_type,
+  call_simd_clone_vec_info_type,
   assignment_vec_info_type,
   condition_vec_info_type,
   reduc_vec_info_type,
--- gcc/tree-vect-stmts.c	(.../trunk)	(revision 205223)
+++ gcc/tree-vect-stmts.c	(.../branches/gomp-4_0-branch)	(revision 205231)
@@ -42,12 +42,15 @@  along with GCC; see the file COPYING3.
 #include "tree-ssanames.h"
 #include "tree-ssa-loop-manip.h"
 #include "cfgloop.h"
+#include "tree-ssa-loop.h"
+#include "tree-scalar-evolution.h"
 #include "expr.h"
 #include "recog.h"		/* FIXME: for insn_data */
 #include "optabs.h"
 #include "diagnostic-core.h"
 #include "tree-vectorizer.h"
 #include "dumpfile.h"
+#include "cgraph.h"
 
 /* For lang_hooks.types.type_for_mode.  */
 #include "langhooks.h"
@@ -1736,7 +1739,8 @@  vectorizable_call (gimple stmt, gimple_s
   if (!is_gimple_call (stmt))
     return false;
 
-  if (TREE_CODE (gimple_call_lhs (stmt)) != SSA_NAME)
+  if (gimple_call_lhs (stmt) == NULL_TREE
+      || TREE_CODE (gimple_call_lhs (stmt)) != SSA_NAME)
     return false;
 
   if (stmt_can_throw_internal (stmt))
@@ -2114,6 +2118,603 @@  vectorizable_call (gimple stmt, gimple_s
 }
 
 
+struct simd_call_arg_info
+{
+  tree vectype;
+  tree op;
+  enum vect_def_type dt;
+  HOST_WIDE_INT linear_step;
+  unsigned int align;
+};
+
+/* Function vectorizable_simd_clone_call.
+
+   Check if STMT performs a function call that can be vectorized
+   by calling a simd clone of the function.
+   If VEC_STMT is also passed, vectorize the STMT: create a vectorized
+   stmt to replace it, put it in VEC_STMT, and insert it at BSI.
+   Return FALSE if not a vectorizable STMT, TRUE otherwise.  */
+
+static bool
+vectorizable_simd_clone_call (gimple stmt, gimple_stmt_iterator *gsi,
+			      gimple *vec_stmt, slp_tree slp_node)
+{
+  tree vec_dest;
+  tree scalar_dest;
+  tree op, type;
+  tree vec_oprnd0 = NULL_TREE;
+  stmt_vec_info stmt_info = vinfo_for_stmt (stmt), prev_stmt_info;
+  tree vectype;
+  unsigned int nunits;
+  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+  bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
+  struct loop *loop = loop_vinfo ? LOOP_VINFO_LOOP (loop_vinfo) : NULL;
+  tree fndecl, new_temp, def;
+  gimple def_stmt;
+  gimple new_stmt = NULL;
+  int ncopies, j;
+  vec<simd_call_arg_info> arginfo = vNULL;
+  vec<tree> vargs = vNULL;
+  size_t i, nargs;
+  tree lhs, rtype, ratype;
+  vec<constructor_elt, va_gc> *ret_ctor_elts;
+
+  /* Is STMT a vectorizable call?   */
+  if (!is_gimple_call (stmt))
+    return false;
+
+  fndecl = gimple_call_fndecl (stmt);
+  if (fndecl == NULL_TREE)
+    return false;
+
+  struct cgraph_node *node = cgraph_get_node (fndecl);
+  if (node == NULL || node->simd_clones == NULL)
+    return false;
+
+  if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
+    return false;
+
+  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def)
+    return false;
+
+  if (gimple_call_lhs (stmt)
+      && TREE_CODE (gimple_call_lhs (stmt)) != SSA_NAME)
+    return false;
+
+  if (stmt_can_throw_internal (stmt))
+    return false;
+
+  vectype = STMT_VINFO_VECTYPE (stmt_info);
+
+  if (loop_vinfo && nested_in_vect_loop_p (loop, stmt))
+    return false;
+
+  /* FORNOW */
+  if (slp_node || PURE_SLP_STMT (stmt_info))
+    return false;
+
+  /* Process function arguments.  */
+  nargs = gimple_call_num_args (stmt);
+
+  /* Bail out if the function has zero arguments.  */
+  if (nargs == 0)
+    return false;
+
+  arginfo.create (nargs);
+
+  for (i = 0; i < nargs; i++)
+    {
+      simd_call_arg_info thisarginfo;
+      affine_iv iv;
+
+      thisarginfo.linear_step = 0;
+      thisarginfo.align = 0;
+      thisarginfo.op = NULL_TREE;
+
+      op = gimple_call_arg (stmt, i);
+      if (!vect_is_simple_use_1 (op, stmt, loop_vinfo, bb_vinfo,
+				 &def_stmt, &def, &thisarginfo.dt,
+				 &thisarginfo.vectype))
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			     "use not simple.\n");
+	  arginfo.release ();
+	  return false;
+	}
+
+      if (thisarginfo.vectype != NULL_TREE
+	  && loop_vinfo
+	  && TREE_CODE (op) == SSA_NAME
+	  && simple_iv (loop, loop_containing_stmt (stmt), op, &iv, false)
+	  && tree_fits_shwi_p (iv.step))
+	{
+	  thisarginfo.linear_step = tree_to_shwi (iv.step);
+	  thisarginfo.op = iv.base;
+	}
+      else if (thisarginfo.vectype == NULL_TREE
+	       && POINTER_TYPE_P (TREE_TYPE (op)))
+	thisarginfo.align = get_pointer_alignment (op) / BITS_PER_UNIT;
+
+      arginfo.quick_push (thisarginfo);
+    }
+
+  unsigned int badness = 0;
+  struct cgraph_node *bestn = NULL;
+  for (struct cgraph_node *n = node->simd_clones; n != NULL;
+       n = n->simdclone->next_clone)
+    {
+      unsigned int this_badness = 0;
+      if (n->simdclone->simdlen
+	  > (unsigned) LOOP_VINFO_VECT_FACTOR (loop_vinfo)
+	  || n->simdclone->nargs != nargs)
+	continue;
+      if (n->simdclone->simdlen
+	  < (unsigned) LOOP_VINFO_VECT_FACTOR (loop_vinfo))
+	this_badness += (exact_log2 (LOOP_VINFO_VECT_FACTOR (loop_vinfo))
+			 - exact_log2 (n->simdclone->simdlen)) * 1024;
+      if (n->simdclone->inbranch)
+	this_badness += 2048;
+      int target_badness = targetm.simd_clone.usable (n);
+      if (target_badness < 0)
+	continue;
+      this_badness += target_badness * 512;
+      /* FORNOW: Have to add code to add the mask argument.  */
+      if (n->simdclone->inbranch)
+	continue;
+      for (i = 0; i < nargs; i++)
+	{
+	  switch (n->simdclone->args[i].arg_type)
+	    {
+	    case SIMD_CLONE_ARG_TYPE_VECTOR:
+	      if (!useless_type_conversion_p
+		     (n->simdclone->args[i].orig_type,
+		      TREE_TYPE (gimple_call_arg (stmt, i))))
+		i = -1;
+	      else if (arginfo[i].vectype == NULL_TREE
+		       || arginfo[i].linear_step)
+		this_badness += 64;
+	      break;
+	    case SIMD_CLONE_ARG_TYPE_UNIFORM:
+	      if (arginfo[i].vectype != NULL_TREE)
+		i = -1;
+	      break;
+	    case SIMD_CLONE_ARG_TYPE_LINEAR_CONSTANT_STEP:
+	      if (arginfo[i].vectype == NULL_TREE
+		  || (arginfo[i].linear_step
+		      != n->simdclone->args[i].linear_step))
+		i = -1;
+	      break;
+	    case SIMD_CLONE_ARG_TYPE_LINEAR_VARIABLE_STEP:
+	      /* FORNOW */
+	      i = -1;
+	      break;
+	    case SIMD_CLONE_ARG_TYPE_MASK:
+	      gcc_unreachable ();
+	    }
+	  if (i == (size_t) -1)
+	    break;
+	  if (n->simdclone->args[i].alignment > arginfo[i].align)
+	    {
+	      i = -1;
+	      break;
+	    }
+	  if (arginfo[i].align)
+	    this_badness += (exact_log2 (arginfo[i].align)
+			     - exact_log2 (n->simdclone->args[i].alignment));
+	}
+      if (i == (size_t) -1)
+	continue;
+      if (bestn == NULL || this_badness < badness)
+	{
+	  bestn = n;
+	  badness = this_badness;
+	}
+    }
+
+  if (bestn == NULL)
+    {
+      arginfo.release ();
+      return false;
+    }
+
+  for (i = 0; i < nargs; i++)
+    if (arginfo[i].vectype == NULL_TREE
+	&& bestn->simdclone->args[i].arg_type == SIMD_CLONE_ARG_TYPE_VECTOR)
+      {
+	arginfo[i].vectype
+	  = get_vectype_for_scalar_type (TREE_TYPE (gimple_call_arg (stmt,
+								     i)));
+	if (arginfo[i].vectype == NULL
+	    || (TYPE_VECTOR_SUBPARTS (arginfo[i].vectype)
+		> bestn->simdclone->simdlen))
+	  {
+	    arginfo.release ();
+	    return false;
+	  }
+      }
+
+  fndecl = bestn->decl;
+  nunits = bestn->simdclone->simdlen;
+  ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits;
+
+  /* If the function isn't const, only allow it in simd loops where user
+     has asserted that at least nunits consecutive iterations can be
+     performed using SIMD instructions.  */
+  if ((loop == NULL || (unsigned) loop->safelen < nunits)
+      && gimple_vuse (stmt))
+    {
+      arginfo.release ();
+      return false;
+    }
+
+  /* Sanity check: make sure that at least one copy of the vectorized stmt
+     needs to be generated.  */
+  gcc_assert (ncopies >= 1);
+
+  if (!vec_stmt) /* transformation not required.  */
+    {
+      STMT_VINFO_TYPE (stmt_info) = call_simd_clone_vec_info_type;
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, vect_location,
+			 "=== vectorizable_simd_clone_call ===\n");
+/*      vect_model_simple_cost (stmt_info, ncopies, dt, NULL, NULL); */
+      arginfo.release ();
+      return true;
+    }
+
+  /** Transform.  **/
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location, "transform call.\n");
+
+  /* Handle def.  */
+  scalar_dest = gimple_call_lhs (stmt);
+  vec_dest = NULL_TREE;
+  rtype = NULL_TREE;
+  ratype = NULL_TREE;
+  if (scalar_dest)
+    {
+      vec_dest = vect_create_destination_var (scalar_dest, vectype);
+      rtype = TREE_TYPE (TREE_TYPE (fndecl));
+      if (TREE_CODE (rtype) == ARRAY_TYPE)
+	{
+	  ratype = rtype;
+	  rtype = TREE_TYPE (ratype);
+	}
+    }
+
+  prev_stmt_info = NULL;
+  for (j = 0; j < ncopies; ++j)
+    {
+      /* Build argument list for the vectorized call.  */
+      if (j == 0)
+	vargs.create (nargs);
+      else
+	vargs.truncate (0);
+
+      for (i = 0; i < nargs; i++)
+	{
+	  unsigned int k, l, m, o;
+	  tree atype;
+	  op = gimple_call_arg (stmt, i);
+	  switch (bestn->simdclone->args[i].arg_type)
+	    {
+	    case SIMD_CLONE_ARG_TYPE_VECTOR:
+	      atype = bestn->simdclone->args[i].vector_type;
+	      o = nunits / TYPE_VECTOR_SUBPARTS (atype);
+	      for (m = j * o; m < (j + 1) * o; m++)
+		{
+		  if (TYPE_VECTOR_SUBPARTS (atype)
+		      < TYPE_VECTOR_SUBPARTS (arginfo[i].vectype))
+		    {
+		      unsigned int prec = GET_MODE_BITSIZE (TYPE_MODE (atype));
+		      k = (TYPE_VECTOR_SUBPARTS (arginfo[i].vectype)
+			   / TYPE_VECTOR_SUBPARTS (atype));
+		      gcc_assert ((k & (k - 1)) == 0);
+		      if (m == 0)
+			vec_oprnd0
+			  = vect_get_vec_def_for_operand (op, stmt, NULL);
+		      else
+			{
+			  vec_oprnd0 = arginfo[i].op;
+			  if ((m & (k - 1)) == 0)
+			    vec_oprnd0
+			      = vect_get_vec_def_for_stmt_copy (arginfo[i].dt,
+								vec_oprnd0);
+			}
+		      arginfo[i].op = vec_oprnd0;
+		      vec_oprnd0
+			= build3 (BIT_FIELD_REF, atype, vec_oprnd0,
+				  build_int_cst (integer_type_node, prec),
+				  build_int_cst (integer_type_node,
+						 (m & (k - 1)) * prec));
+		      new_stmt
+			= gimple_build_assign_with_ops (BIT_FIELD_REF,
+							make_ssa_name (atype,
+								       NULL),
+							vec_oprnd0, NULL_TREE);
+		      vect_finish_stmt_generation (stmt, new_stmt, gsi);
+		      vargs.safe_push (gimple_assign_lhs (new_stmt));
+		    }
+		  else
+		    {
+		      k = (TYPE_VECTOR_SUBPARTS (atype)
+			   / TYPE_VECTOR_SUBPARTS (arginfo[i].vectype));
+		      gcc_assert ((k & (k - 1)) == 0);
+		      vec<constructor_elt, va_gc> *ctor_elts;
+		      if (k != 1)
+			vec_alloc (ctor_elts, k);
+		      else
+			ctor_elts = NULL;
+		      for (l = 0; l < k; l++)
+			{
+			  if (m == 0 && l == 0)
+			    vec_oprnd0
+			      = vect_get_vec_def_for_operand (op, stmt, NULL);
+			  else
+			    vec_oprnd0
+			      = vect_get_vec_def_for_stmt_copy (arginfo[i].dt,
+								arginfo[i].op);
+			  arginfo[i].op = vec_oprnd0;
+			  if (k == 1)
+			    break;
+			  CONSTRUCTOR_APPEND_ELT (ctor_elts, NULL_TREE,
+						  vec_oprnd0);
+			}
+		      if (k == 1)
+			vargs.safe_push (vec_oprnd0);
+		      else
+			{
+			  vec_oprnd0 = build_constructor (atype, ctor_elts);
+			  new_stmt
+			    = gimple_build_assign_with_ops
+				(CONSTRUCTOR, make_ssa_name (atype, NULL),
+				 vec_oprnd0, NULL_TREE);
+			  vect_finish_stmt_generation (stmt, new_stmt, gsi);
+			  vargs.safe_push (gimple_assign_lhs (new_stmt));
+			}
+		    }
+		}
+	      break;
+	    case SIMD_CLONE_ARG_TYPE_UNIFORM:
+	      vargs.safe_push (op);
+	      break;
+	    case SIMD_CLONE_ARG_TYPE_LINEAR_CONSTANT_STEP:
+	      if (j == 0)
+		{
+		  gimple_seq stmts;
+		  arginfo[i].op
+		    = force_gimple_operand (arginfo[i].op, &stmts, true,
+					    NULL_TREE);
+		  if (stmts != NULL)
+		    {
+		      basic_block new_bb;
+		      edge pe = loop_preheader_edge (loop);
+		      new_bb = gsi_insert_seq_on_edge_immediate (pe, stmts);
+		      gcc_assert (!new_bb);
+		    }
+		  tree phi_res = copy_ssa_name (op, NULL);
+		  gimple new_phi = create_phi_node (phi_res, loop->header);
+		  set_vinfo_for_stmt (new_phi,
+				      new_stmt_vec_info (new_phi, loop_vinfo,
+							 NULL));
+		  add_phi_arg (new_phi, arginfo[i].op,
+			       loop_preheader_edge (loop), UNKNOWN_LOCATION);
+		  enum tree_code code
+		    = POINTER_TYPE_P (TREE_TYPE (op))
+		      ? POINTER_PLUS_EXPR : PLUS_EXPR;
+		  tree type = POINTER_TYPE_P (TREE_TYPE (op))
+			      ? sizetype : TREE_TYPE (op);
+		  double_int cst
+		    = double_int::from_shwi (arginfo[i].linear_step);
+		  cst *= double_int::from_uhwi (ncopies * nunits);
+		  tree tcst = double_int_to_tree (type, cst);
+		  tree phi_arg = copy_ssa_name (op, NULL);
+		  new_stmt = gimple_build_assign_with_ops (code, phi_arg,
+							   phi_res, tcst);
+		  gimple_stmt_iterator si = gsi_after_labels (loop->header);
+		  gsi_insert_after (&si, new_stmt, GSI_NEW_STMT);
+		  set_vinfo_for_stmt (new_stmt,
+				      new_stmt_vec_info (new_stmt, loop_vinfo,
+							 NULL));
+		  add_phi_arg (new_phi, phi_arg, loop_latch_edge (loop),
+			       UNKNOWN_LOCATION);
+		  arginfo[i].op = phi_res;
+		  vargs.safe_push (phi_res);
+		}
+	      else
+		{
+		  enum tree_code code
+		    = POINTER_TYPE_P (TREE_TYPE (op))
+		      ? POINTER_PLUS_EXPR : PLUS_EXPR;
+		  tree type = POINTER_TYPE_P (TREE_TYPE (op))
+			      ? sizetype : TREE_TYPE (op);
+		  double_int cst
+		    = double_int::from_shwi (arginfo[i].linear_step);
+		  cst *= double_int::from_uhwi (j * nunits);
+		  tree tcst = double_int_to_tree (type, cst);
+		  new_temp = make_ssa_name (TREE_TYPE (op), NULL);
+		  new_stmt
+		    = gimple_build_assign_with_ops (code, new_temp,
+						    arginfo[i].op, tcst);
+		  vect_finish_stmt_generation (stmt, new_stmt, gsi);
+		  vargs.safe_push (new_temp);
+		}
+	      break;
+	    case SIMD_CLONE_ARG_TYPE_LINEAR_VARIABLE_STEP:
+	    default:
+	      gcc_unreachable ();
+	    }
+	}
+
+      new_stmt = gimple_build_call_vec (fndecl, vargs);
+      if (vec_dest)
+	{
+	  gcc_assert (ratype || TYPE_VECTOR_SUBPARTS (rtype) == nunits);
+	  if (ratype)
+	    new_temp = create_tmp_var (ratype, NULL);
+	  else if (TYPE_VECTOR_SUBPARTS (vectype)
+		   == TYPE_VECTOR_SUBPARTS (rtype))
+	    new_temp = make_ssa_name (vec_dest, new_stmt);
+	  else
+	    new_temp = make_ssa_name (rtype, new_stmt);
+	  gimple_call_set_lhs (new_stmt, new_temp);
+	}
+      vect_finish_stmt_generation (stmt, new_stmt, gsi);
+
+      if (vec_dest)
+	{
+	  if (TYPE_VECTOR_SUBPARTS (vectype) < nunits)
+	    {
+	      unsigned int k, l;
+	      unsigned int prec = GET_MODE_BITSIZE (TYPE_MODE (vectype));
+	      k = nunits / TYPE_VECTOR_SUBPARTS (vectype);
+	      gcc_assert ((k & (k - 1)) == 0);
+	      for (l = 0; l < k; l++)
+		{
+		  tree t;
+		  if (ratype)
+		    {
+		      t = build_fold_addr_expr (new_temp);
+		      t = build2 (MEM_REF, vectype, t,
+				  build_int_cst (TREE_TYPE (t),
+						 l * prec / BITS_PER_UNIT));
+		    }
+		  else
+		    t = build3 (BIT_FIELD_REF, vectype, new_temp,
+				build_int_cst (integer_type_node, prec),
+				build_int_cst (integer_type_node, l * prec));
+		  new_stmt
+		    = gimple_build_assign_with_ops (TREE_CODE (t),
+						    make_ssa_name (vectype,
+								   NULL),
+						    t, NULL_TREE);
+		  vect_finish_stmt_generation (stmt, new_stmt, gsi);
+		  if (j == 0 && l == 0)
+		    STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt;
+		  else
+		    STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
+
+		  prev_stmt_info = vinfo_for_stmt (new_stmt);
+		}
+
+	      if (ratype)
+		{
+		  tree clobber = build_constructor (ratype, NULL);
+		  TREE_THIS_VOLATILE (clobber) = 1;
+		  new_stmt = gimple_build_assign (new_temp, clobber);
+		  vect_finish_stmt_generation (stmt, new_stmt, gsi);
+		}
+	      continue;
+	    }
+	  else if (TYPE_VECTOR_SUBPARTS (vectype) > nunits)
+	    {
+	      unsigned int k = (TYPE_VECTOR_SUBPARTS (vectype)
+				/ TYPE_VECTOR_SUBPARTS (rtype));
+	      gcc_assert ((k & (k - 1)) == 0);
+	      if ((j & (k - 1)) == 0)
+		vec_alloc (ret_ctor_elts, k);
+	      if (ratype)
+		{
+		  unsigned int m, o = nunits / TYPE_VECTOR_SUBPARTS (rtype);
+		  for (m = 0; m < o; m++)
+		    {
+		      tree tem = build4 (ARRAY_REF, rtype, new_temp,
+					 size_int (m), NULL_TREE, NULL_TREE);
+		      new_stmt
+			= gimple_build_assign_with_ops (ARRAY_REF, rtype,
+							make_ssa_name (rtype,
+								       NULL),
+							tem);
+		      vect_finish_stmt_generation (stmt, new_stmt, gsi);
+		      CONSTRUCTOR_APPEND_ELT (ret_ctor_elts, NULL_TREE, tem);
+		    }
+		  tree clobber = build_constructor (ratype, NULL);
+		  TREE_THIS_VOLATILE (clobber) = 1;
+		  new_stmt = gimple_build_assign (new_temp, clobber);
+		  vect_finish_stmt_generation (stmt, new_stmt, gsi);
+		}
+	      else
+		CONSTRUCTOR_APPEND_ELT (ret_ctor_elts, NULL_TREE, new_temp);
+	      if ((j & (k - 1)) != k - 1)
+		continue;
+	      vec_oprnd0 = build_constructor (vectype, ret_ctor_elts);
+	      new_stmt
+		= gimple_build_assign_with_ops (CONSTRUCTOR,
+						make_ssa_name (vec_dest, NULL),
+						vec_oprnd0, NULL_TREE);
+	      vect_finish_stmt_generation (stmt, new_stmt, gsi);
+
+	      if ((unsigned) j == k - 1)
+		STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt;
+	      else
+		STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
+
+	      prev_stmt_info = vinfo_for_stmt (new_stmt);
+	      continue;
+	    }
+	  else if (ratype)
+	    {
+	      tree t = build_fold_addr_expr (new_temp);
+	      t = build2 (MEM_REF, vectype, t,
+			  build_int_cst (TREE_TYPE (t), 0));
+	      new_stmt
+		= gimple_build_assign_with_ops (MEM_REF, vectype,
+						make_ssa_name (vec_dest,
+							       NULL), t);
+	      vect_finish_stmt_generation (stmt, new_stmt, gsi);
+	      tree clobber = build_constructor (ratype, NULL);
+	      TREE_THIS_VOLATILE (clobber) = 1;
+	      vect_finish_stmt_generation (stmt,
+					   gimple_build_assign (new_temp,
+								clobber), gsi);
+	    }
+	}
+
+      if (j == 0)
+	STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt;
+      else
+	STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
+
+      prev_stmt_info = vinfo_for_stmt (new_stmt);
+    }
+
+  vargs.release ();
+
+  /* Update the exception handling table with the vector stmt if necessary.  */
+  if (maybe_clean_or_replace_eh_stmt (stmt, *vec_stmt))
+    gimple_purge_dead_eh_edges (gimple_bb (stmt));
+
+  /* The call in STMT might prevent it from being removed in dce.
+     We however cannot remove it here, due to the way the ssa name
+     it defines is mapped to the new definition.  So just replace
+     rhs of the statement with something harmless.  */
+
+  if (slp_node)
+    return true;
+
+  if (scalar_dest)
+    {
+      type = TREE_TYPE (scalar_dest);
+      if (is_pattern_stmt_p (stmt_info))
+	lhs = gimple_call_lhs (STMT_VINFO_RELATED_STMT (stmt_info));
+      else
+	lhs = gimple_call_lhs (stmt);
+      new_stmt = gimple_build_assign (lhs, build_zero_cst (type));
+    }
+  else
+    new_stmt = gimple_build_nop ();
+  set_vinfo_for_stmt (new_stmt, stmt_info);
+  set_vinfo_for_stmt (stmt, NULL);
+  STMT_VINFO_STMT (stmt_info) = new_stmt;
+  gsi_replace (gsi, new_stmt, false);
+  unlink_stmt_vdef (stmt);
+
+  return true;
+}
+
+
 /* Function vect_gen_widened_results_half
 
    Create a vector stmt whose code, type, number of arguments, and result
@@ -5838,6 +6439,7 @@  vect_analyze_stmt (gimple stmt, bool *ne
             || vectorizable_assignment (stmt, NULL, NULL, NULL)
             || vectorizable_load (stmt, NULL, NULL, NULL, NULL)
 	    || vectorizable_call (stmt, NULL, NULL, NULL)
+	    || vectorizable_simd_clone_call (stmt, NULL, NULL, NULL)
             || vectorizable_store (stmt, NULL, NULL, NULL)
             || vectorizable_reduction (stmt, NULL, NULL, NULL)
             || vectorizable_condition (stmt, NULL, NULL, NULL, 0, NULL));
@@ -5850,6 +6452,7 @@  vect_analyze_stmt (gimple stmt, bool *ne
                 || vectorizable_assignment (stmt, NULL, NULL, node)
                 || vectorizable_load (stmt, NULL, NULL, node, NULL)
 		|| vectorizable_call (stmt, NULL, NULL, node)
+		|| vectorizable_simd_clone_call (stmt, NULL, NULL, node)
                 || vectorizable_store (stmt, NULL, NULL, node)
                 || vectorizable_condition (stmt, NULL, NULL, NULL, 0, node));
       }
@@ -5972,6 +6575,11 @@  vect_transform_stmt (gimple stmt, gimple
       stmt = gsi_stmt (*gsi);
       break;
 
+    case call_simd_clone_vec_info_type:
+      done = vectorizable_simd_clone_call (stmt, gsi, &vec_stmt, slp_node);
+      stmt = gsi_stmt (*gsi);
+      break;
+
     case reduc_vec_info_type:
       done = vectorizable_reduction (stmt, gsi, &vec_stmt, slp_node);
       gcc_assert (done);
--- gcc/c/c-decl.c	(.../trunk)	(revision 205223)
+++ gcc/c/c-decl.c	(.../branches/gomp-4_0-branch)	(revision 205231)
@@ -3646,8 +3646,9 @@  c_builtin_function_ext_scope (tree decl)
   const char *name = IDENTIFIER_POINTER (id);
   C_DECL_BUILTIN_PROTOTYPE (decl) = prototype_p (type);
 
-  bind (id, decl, external_scope, /*invisible=*/false, /*nested=*/false,
-	UNKNOWN_LOCATION);
+  if (external_scope)
+    bind (id, decl, external_scope, /*invisible=*/false, /*nested=*/false,
+	  UNKNOWN_LOCATION);
 
   /* Builtins in the implementation namespace are made visible without
      needing to be explicitly declared.  See push_file_scope.  */
--- gcc/cp/semantics.c	(.../trunk)	(revision 205223)
+++ gcc/cp/semantics.c	(.../branches/gomp-4_0-branch)	(revision 205231)
@@ -5214,6 +5214,8 @@  finish_omp_clauses (tree clauses)
 	      t = mark_rvalue_use (t);
 	      if (!processing_template_decl)
 		{
+		  if (TREE_CODE (OMP_CLAUSE_DECL (c)) == PARM_DECL)
+		    t = maybe_constant_value (t);
 		  t = fold_build_cleanup_point_expr (TREE_TYPE (t), t);
 		  if (TREE_CODE (TREE_TYPE (OMP_CLAUSE_DECL (c)))
 		      == POINTER_TYPE)
--- gcc/testsuite/gcc.dg/gomp/simd-clones-1.c	(.../trunk)	(revision 0)
+++ gcc/testsuite/gcc.dg/gomp/simd-clones-1.c	(.../branches/gomp-4_0-branch)	(revision 205231)
@@ -0,0 +1,33 @@ 
+/* { dg-do compile } */
+/* { dg-options "-fopenmp -fdump-tree-optimized -O3" } */
+
+/* Test that functions that have SIMD clone counterparts are not
+   cloned by IPA-cp.  For example, special_add() below has SIMD clones
+   created for it.  However, if IPA-cp later decides to clone a
+   specialization of special_add(x, 666) when analyzing fillit(), we
+   will forever keep the vectorizer from using the SIMD versions of
+   special_add in a loop.
+
+   If IPA-CP gets taught how to adjust the SIMD clones as well, this
+   test could be removed.  */
+
+#pragma omp declare simd simdlen(4)
+static int  __attribute__ ((noinline))
+special_add (int x, int y)
+{
+  if (y == 666)
+    return x + y + 123;
+  else
+    return x + y;
+}
+
+void fillit(int *tot)
+{
+  int i;
+
+  for (i=0; i < 10000; ++i)
+    tot[i] = special_add (i, 666);
+}
+
+/* { dg-final { scan-tree-dump-not "special_add.constprop" "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
--- gcc/testsuite/gcc.dg/gomp/simd-clones-2.c	(.../trunk)	(revision 0)
+++ gcc/testsuite/gcc.dg/gomp/simd-clones-2.c	(.../branches/gomp-4_0-branch)	(revision 205231)
@@ -0,0 +1,26 @@ 
+/* { dg-options "-fopenmp -fdump-tree-optimized -O" } */
+
+#pragma omp declare simd inbranch uniform(c) linear(b:66)
+#pragma omp declare simd notinbranch aligned(c:32)
+int addit(int a, int b, int *c)
+{
+  return a + b;
+}
+
+#pragma omp declare simd uniform(a) aligned(a:32) linear(k:1) notinbranch
+float setArray(float *a, float x, int k)
+{
+  a[k] = a[k] + x;
+  return a[k];
+}
+
+/* { dg-final { scan-tree-dump "_ZGVbN4ua32vl_setArray" "optimized" { target i?86-*-* x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump "_ZGVbN4vvva32_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump "_ZGVbM4vl66u_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump "_ZGVcN8ua32vl_setArray" "optimized" { target i?86-*-* x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump "_ZGVcN4vvva32_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump "_ZGVcM4vl66u_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump "_ZGVdN8ua32vl_setArray" "optimized" { target i?86-*-* x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump "_ZGVdN8vvva32_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump "_ZGVdM8vl66u_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
--- gcc/testsuite/gcc.dg/gomp/simd-clones-3.c	(.../trunk)	(revision 0)
+++ gcc/testsuite/gcc.dg/gomp/simd-clones-3.c	(.../branches/gomp-4_0-branch)	(revision 205231)
@@ -0,0 +1,18 @@ 
+/* { dg-options "-fopenmp -fdump-tree-optimized -O2" } */
+
+/* Test that if there is no *inbranch clauses, that both the masked and
+   the unmasked version are created.  */
+
+#pragma omp declare simd
+int addit(int a, int b, int c)
+{
+  return a + b;
+}
+
+/* { dg-final { scan-tree-dump "_ZGVbN4vvv_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump "_ZGVbM4vvv_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump "_ZGVcN4vvv_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump "_ZGVcM4vvv_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump "_ZGVdN8vvv_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */
+/* { dg-final { scan-tree-dump "_ZGVdM8vvv_addit" "optimized" { target i?86-*-* x86_64-*-* } } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
--- gcc/testsuite/gcc.dg/gomp/simd-clones-4.c	(.../trunk)	(revision 0)
+++ gcc/testsuite/gcc.dg/gomp/simd-clones-4.c	(.../branches/gomp-4_0-branch)	(revision 205231)
@@ -0,0 +1,11 @@ 
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-fopenmp" } */
+
+#pragma omp declare simd simdlen(4) notinbranch
+int f2 (int a, int b)
+{
+  if (a > 5)
+    return a + b;
+  else
+    return a - b;
+}
--- gcc/testsuite/gcc.dg/gomp/simd-clones-5.c	(.../trunk)	(revision 0)
+++ gcc/testsuite/gcc.dg/gomp/simd-clones-5.c	(.../branches/gomp-4_0-branch)	(revision 205231)
@@ -0,0 +1,12 @@ 
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-fopenmp -w" } */
+
+/* ?? The -w above is to inhibit the following warning for now:
+   a.c:2:6: warning: AVX vector argument without AVX enabled changes
+   the ABI [enabled by default].  */
+
+#pragma omp declare simd notinbranch simdlen(4)
+void foo (int *a)
+{
+  *a = 555;
+}
--- gcc/testsuite/gcc.dg/gomp/simd-clones-6.c	(.../trunk)	(revision 0)
+++ gcc/testsuite/gcc.dg/gomp/simd-clones-6.c	(.../branches/gomp-4_0-branch)	(revision 205231)
@@ -0,0 +1,11 @@ 
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-fopenmp" } */
+
+/* Test that array subscripts are properly adjusted.  */
+
+int array[1000];
+#pragma omp declare simd notinbranch simdlen(4)
+void foo (int i)
+{
+  array[i] = 555;
+}
--- gcc/testsuite/gcc.dg/gomp/simd-clones-7.c	(.../trunk)	(revision 0)
+++ gcc/testsuite/gcc.dg/gomp/simd-clones-7.c	(.../branches/gomp-4_0-branch)	(revision 205231)
@@ -0,0 +1,16 @@ 
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-fopenmp -w" } */
+
+int array[1000];
+
+#pragma omp declare simd notinbranch simdlen(4)
+void foo (int *a, int b)
+{
+  a[b] = 555;
+}
+
+#pragma omp declare simd notinbranch simdlen(4)
+void bar (int *a)
+{
+  *a = 555;
+}
--- gcc/testsuite/gcc.dg/vect/vect-simd-clone-10a.c	(.../trunk)	(revision 0)
+++ gcc/testsuite/gcc.dg/vect/vect-simd-clone-10a.c	(.../branches/gomp-4_0-branch)	(revision 205231)
@@ -0,0 +1,17 @@ 
+/* { dg-do compile } */
+
+#include "vect-simd-clone-10.h"
+
+#pragma omp declare simd notinbranch
+extern int
+foo (long int a, int b, int c)
+{
+  return a + b + c;
+}
+
+#pragma omp declare simd notinbranch
+extern long int
+bar (int a, int b, long int c)
+{
+  return a + b + c;
+}
--- gcc/testsuite/gcc.dg/vect/vect-simd-clone-10.c	(.../trunk)	(revision 0)
+++ gcc/testsuite/gcc.dg/vect/vect-simd-clone-10.c	(.../branches/gomp-4_0-branch)	(revision 205231)
@@ -0,0 +1,83 @@ 
+/* { dg-additional-options "-fopenmp-simd" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+/* { dg-additional-sources vect-simd-clone-10a.c } */
+
+#include "tree-vect.h"
+
+#ifndef N
+#define N 1024
+#endif
+
+int a[N], b[N];
+long int c[N];
+unsigned char d[N];
+
+#include "vect-simd-clone-10.h"
+
+__attribute__((noinline)) void
+fn1 (void)
+{
+  int i;
+  #pragma omp simd
+  for (i = 0; i < N; i++)
+    a[i] = foo (c[i], a[i], b[i]) + 6;
+  #pragma omp simd
+  for (i = 0; i < N; i++)
+    c[i] = bar (a[i], b[i], c[i]) * 2;
+}
+
+__attribute__((noinline)) void
+fn2 (void)
+{
+  int i;
+  #pragma omp simd
+  for (i = 0; i < N; i++)
+    {
+      a[i] = foo (c[i], a[i], b[i]) + 6;
+      d[i]++;
+    }
+  #pragma omp simd
+  for (i = 0; i < N; i++)
+    {
+      c[i] = bar (a[i], b[i], c[i]) * 2;
+      d[i] /= 2;
+    }
+}
+
+__attribute__((noinline)) void
+fn3 (void)
+{
+  int i;
+  for (i = 0; i < N; i++)
+    {
+      a[i] = i * 2;
+      b[i] = 17 + (i % 37);
+      c[i] = (i & 63);
+      d[i] = 16 + i;
+    }
+}
+
+int
+main ()
+{
+  int i;
+  check_vect ();
+  fn3 ();
+  fn1 ();
+  for (i = 0; i < N; i++)
+    if (a[i] != i * 2 + 23 + (i % 37) + (i & 63)
+	|| b[i] != 17 + (i % 37)
+	|| c[i] != i * 4 + 80 + 4 * (i % 37) + 4 * (i & 63))
+      abort ();
+  fn3 ();
+  fn2 ();
+  for (i = 0; i < N; i++)
+    if (a[i] != i * 2 + 23 + (i % 37) + (i & 63)
+	|| b[i] != 17 + (i % 37)
+	|| c[i] != i * 4 + 80 + 4 * (i % 37) + 4 * (i & 63)
+	|| d[i] != ((unsigned char) (17 + i)) / 2)
+      abort ();
+  return 0;
+}
+
+/* { dg-final { cleanup-tree-dump "vect" } } */
--- gcc/testsuite/gcc.dg/vect/vect-simd-clone-10.h	(.../trunk)	(revision 0)
+++ gcc/testsuite/gcc.dg/vect/vect-simd-clone-10.h	(.../branches/gomp-4_0-branch)	(revision 205231)
@@ -0,0 +1,4 @@ 
+#pragma omp declare simd notinbranch
+extern int foo (long int a, int b, int c);
+#pragma omp declare simd notinbranch
+extern long int bar (int a, int b, long int c);
--- gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c	(.../trunk)	(revision 0)
+++ gcc/testsuite/gcc.dg/vect/vect-simd-clone-1.c	(.../branches/gomp-4_0-branch)	(revision 205231)
@@ -0,0 +1,58 @@ 
+/* { dg-additional-options "-fopenmp-simd" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#include "tree-vect.h"
+
+#ifndef N
+#define N 1024
+#endif
+
+int array[N];
+
+#pragma omp declare simd simdlen(4) notinbranch
+#pragma omp declare simd simdlen(4) notinbranch uniform(b) linear(c:3)
+#pragma omp declare simd simdlen(8) notinbranch
+#pragma omp declare simd simdlen(8) notinbranch uniform(b) linear(c:3)
+__attribute__((noinline)) int
+foo (int a, int b, int c)
+{
+  if (a < 30)
+    return 5;
+  return a + b + c;
+}
+
+__attribute__((noinline, noclone)) void
+bar ()
+{
+  int i;
+#pragma omp simd
+  for (i = 0; i < N; ++i)
+    array[i] = foo (i, 123, i * 3);
+}
+
+__attribute__((noinline, noclone)) void
+baz ()
+{
+  int i;
+#pragma omp simd
+  for (i = 0; i < N; ++i)
+    array[i] = foo (i, array[i], i * 3);
+}
+
+int
+main ()
+{
+  int i;
+  check_vect ();
+  bar ();
+  for (i = 0; i < N; i++)
+    if (array[i] != (i < 30 ? 5 : i * 4 + 123))
+      abort ();
+  baz ();
+  for (i = 0; i < N; i++)
+    if (array[i] != (i < 30 ? 5 : i * 8 + 123))
+      abort ();
+  return 0;
+}
+
+/* { dg-final { cleanup-tree-dump "vect" } } */
--- gcc/testsuite/gcc.dg/vect/vect-simd-clone-2.c	(.../trunk)	(revision 0)
+++ gcc/testsuite/gcc.dg/vect/vect-simd-clone-2.c	(.../branches/gomp-4_0-branch)	(revision 205231)
@@ -0,0 +1,52 @@ 
+/* { dg-additional-options "-fopenmp-simd" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#include "tree-vect.h"
+
+#ifndef N
+#define N 1024
+#endif
+
+int array[N] __attribute__((aligned (32)));
+
+#pragma omp declare simd simdlen(4) notinbranch aligned(a:16) uniform(a) linear(b)
+#pragma omp declare simd simdlen(4) notinbranch aligned(a:32) uniform(a) linear(b)
+#pragma omp declare simd simdlen(8) notinbranch aligned(a:16) uniform(a) linear(b)
+#pragma omp declare simd simdlen(8) notinbranch aligned(a:32) uniform(a) linear(b)
+__attribute__((noinline)) void
+foo (int *a, int b, int c)
+{
+  a[b] = c;
+}
+
+__attribute__((noinline, noclone)) void
+bar ()
+{
+  int i;
+#pragma omp simd
+  for (i = 0; i < N; ++i)
+    foo (array, i, i * array[i]);
+}
+
+__attribute__((noinline, noclone)) void
+baz ()
+{
+  int i;
+  for (i = 0; i < N; i++)
+    array[i] = 5 * (i & 7);
+}
+
+int
+main ()
+{
+  int i;
+  check_vect ();
+  baz ();
+  bar ();
+  for (i = 0; i < N; i++)
+    if (array[i] != 5 * (i & 7) * i)
+      abort ();
+  return 0;
+}
+
+/* { dg-final { cleanup-tree-dump "vect" } } */
--- gcc/testsuite/gcc.dg/vect/vect-simd-clone-3.c	(.../trunk)	(revision 0)
+++ gcc/testsuite/gcc.dg/vect/vect-simd-clone-3.c	(.../branches/gomp-4_0-branch)	(revision 205231)
@@ -0,0 +1,45 @@ 
+/* { dg-additional-options "-fopenmp-simd" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#include "tree-vect.h"
+
+#ifndef N
+#define N 1024
+#endif
+
+int d[N], e[N];
+
+#pragma omp declare simd simdlen(4) notinbranch uniform(b) linear(c:3)
+__attribute__((noinline)) int
+foo (int a, int b, int c)
+{
+  if (a < 30)
+    return 5;
+  return a + b + c;
+}
+
+__attribute__((noinline, noclone)) void
+bar ()
+{
+  int i;
+#pragma omp simd
+  for (i = 0; i < N; ++i)
+    {
+      d[i] = foo (i, 123, i * 3);
+      e[i] = e[i] + i;
+    }
+}
+
+int
+main ()
+{
+  int i;
+  check_vect ();
+  bar ();
+  for (i = 0; i < N; i++)
+    if (d[i] != (i < 30 ? 5 : i * 4 + 123) || e[i] != i)
+      abort ();
+  return 0;
+}
+
+/* { dg-final { cleanup-tree-dump "vect" } } */
--- gcc/testsuite/gcc.dg/vect/vect-simd-clone-4.c	(.../trunk)	(revision 0)
+++ gcc/testsuite/gcc.dg/vect/vect-simd-clone-4.c	(.../branches/gomp-4_0-branch)	(revision 205231)
@@ -0,0 +1,48 @@ 
+/* { dg-additional-options "-fopenmp-simd" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#include "tree-vect.h"
+
+#ifndef N
+#define N 1024
+#endif
+
+float d[N];
+int e[N];
+unsigned short f[N];
+
+#pragma omp declare simd simdlen(8) notinbranch uniform(b)
+__attribute__((noinline)) float
+foo (float a, float b, float c)
+{
+  if (a < 30)
+    return 5.0f;
+  return a + b + c;
+}
+
+__attribute__((noinline, noclone)) void
+bar ()
+{
+  int i;
+#pragma omp simd
+  for (i = 0; i < N; ++i)
+    {
+      d[i] = foo (i, 123, i * 3);
+      e[i] = e[i] * 3;
+      f[i] = f[i] + 1;
+    }
+}
+
+int
+main ()
+{
+  int i;
+  check_vect ();
+  bar ();
+  for (i = 0; i < N; i++)
+    if (d[i] != (i < 30 ? 5.0f : i * 4 + 123.0f) || e[i] || f[i] != 1)
+      abort ();
+  return 0;
+}
+
+/* { dg-final { cleanup-tree-dump "vect" } } */
--- gcc/testsuite/gcc.dg/vect/vect-simd-clone-5.c	(.../trunk)	(revision 0)
+++ gcc/testsuite/gcc.dg/vect/vect-simd-clone-5.c	(.../branches/gomp-4_0-branch)	(revision 205231)
@@ -0,0 +1,43 @@ 
+/* { dg-additional-options "-fopenmp-simd" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#include "tree-vect.h"
+
+#ifndef N
+#define N 1024
+#endif
+
+int d[N], e[N];
+
+#pragma omp declare simd simdlen(4) notinbranch uniform(b) linear(c:3)
+__attribute__((noinline)) long long int
+foo (int a, int b, int c)
+{
+  return a + b + c;
+}
+
+__attribute__((noinline, noclone)) void
+bar ()
+{
+  int i;
+#pragma omp simd
+  for (i = 0; i < N; ++i)
+    {
+      d[i] = foo (i, 123, i * 3);
+      e[i] = e[i] + i;
+    }
+}
+
+int
+main ()
+{
+  int i;
+  check_vect ();
+  bar ();
+  for (i = 0; i < N; i++)
+    if (d[i] != i * 4 + 123 || e[i] != i)
+      abort ();
+  return 0;
+}
+
+/* { dg-final { cleanup-tree-dump "vect" } } */
--- gcc/testsuite/gcc.dg/vect/vect-simd-clone-6.c	(.../trunk)	(revision 0)
+++ gcc/testsuite/gcc.dg/vect/vect-simd-clone-6.c	(.../branches/gomp-4_0-branch)	(revision 205231)
@@ -0,0 +1,74 @@ 
+/* { dg-additional-options "-fopenmp-simd" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#include "tree-vect.h"
+
+#ifndef N
+#define N 1024
+#endif
+
+int a[N];
+long long int b[N];
+short c[N];
+
+#pragma omp declare simd
+#pragma omp declare simd uniform(b) linear(c:3)
+__attribute__((noinline)) short
+foo (int a, long long int b, short c)
+{
+  return a + b + c;
+}
+
+__attribute__((noinline, noclone)) void
+bar (int x)
+{
+  int i;
+  if (x == 0)
+    {
+    #pragma omp simd
+      for (i = 0; i < N; i++)
+	c[i] = foo (a[i], b[i], c[i]);
+    }
+  else
+    {
+    #pragma omp simd
+      for (i = 0; i < N; i++)
+	c[i] = foo (a[i], x, i * 3);
+    }
+}
+
+__attribute__((noinline, noclone)) void
+baz (void)
+{
+  int i;
+  for (i = 0; i < N; i++)
+    {
+      a[i] = 2 * i;
+      b[i] = -7 * i + 6;
+      c[i] = (i & 31) << 4;
+    }
+}
+
+int
+main ()
+{
+  int i;
+  check_vect ();
+  baz ();
+  bar (0);
+  for (i = 0; i < N; i++)
+    if (a[i] != 2 * i || b[i] != 6 - 7 * i
+	|| c[i] != 6 - 5 * i + ((i & 31) << 4))
+      abort ();
+    else
+      a[i] = c[i];
+  bar (17);
+  for (i = 0; i < N; i++)
+    if (a[i] != 6 - 5 * i + ((i & 31) << 4)
+	|| b[i] != 6 - 7 * i
+	|| c[i] != 23 - 2 * i + ((i & 31) << 4))
+      abort ();
+  return 0;
+}
+
+/* { dg-final { cleanup-tree-dump "vect" } } */
--- gcc/testsuite/gcc.dg/vect/vect-simd-clone-7.c	(.../trunk)	(revision 0)
+++ gcc/testsuite/gcc.dg/vect/vect-simd-clone-7.c	(.../branches/gomp-4_0-branch)	(revision 205231)
@@ -0,0 +1,74 @@ 
+/* { dg-additional-options "-fopenmp-simd" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#include "tree-vect.h"
+
+#ifndef N
+#define N 1024
+#endif
+
+int a[N];
+long long int b[N];
+short c[N];
+
+#pragma omp declare simd
+#pragma omp declare simd uniform(b) linear(c:3)
+__attribute__((noinline)) short
+foo (int a, long long int b, int c)
+{
+  return a + b + c;
+}
+
+__attribute__((noinline, noclone)) void
+bar (int x)
+{
+  int i;
+  if (x == 0)
+    {
+    #pragma omp simd
+      for (i = 0; i < N; i++)
+	c[i] = foo (a[i], b[i], c[i]);
+    }
+  else
+    {
+    #pragma omp simd
+      for (i = 0; i < N; i++)
+	c[i] = foo (a[i], x, i * 3);
+    }
+}
+
+__attribute__((noinline, noclone)) void
+baz (void)
+{
+  int i;
+  for (i = 0; i < N; i++)
+    {
+      a[i] = 2 * i;
+      b[i] = -7 * i + 6;
+      c[i] = (i & 31) << 4;
+    }
+}
+
+int
+main ()
+{
+  int i;
+  check_vect ();
+  baz ();
+  bar (0);
+  for (i = 0; i < N; i++)
+    if (a[i] != 2 * i || b[i] != 6 - 7 * i
+	|| c[i] != 6 - 5 * i + ((i & 31) << 4))
+      abort ();
+    else
+      a[i] = c[i];
+  bar (17);
+  for (i = 0; i < N; i++)
+    if (a[i] != 6 - 5 * i + ((i & 31) << 4)
+	|| b[i] != 6 - 7 * i
+	|| c[i] != 23 - 2 * i + ((i & 31) << 4))
+      abort ();
+  return 0;
+}
+
+/* { dg-final { cleanup-tree-dump "vect" } } */
--- gcc/testsuite/gcc.dg/vect/vect-simd-clone-8.c	(.../trunk)	(revision 0)
+++ gcc/testsuite/gcc.dg/vect/vect-simd-clone-8.c	(.../branches/gomp-4_0-branch)	(revision 205231)
@@ -0,0 +1,94 @@ 
+/* { dg-additional-options "-fopenmp-simd" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#include "tree-vect.h"
+
+#ifndef N
+#define N 1024
+#endif
+
+int a[N], b[N];
+long int c[N];
+unsigned char d[N];
+
+#pragma omp declare simd simdlen(8) notinbranch
+__attribute__((noinline)) int
+foo (long int a, int b, int c)
+{
+  return a + b + c;
+}
+
+#pragma omp declare simd simdlen(8) notinbranch
+__attribute__((noinline)) long int
+bar (int a, int b, long int c)
+{
+  return a + b + c;
+}
+
+__attribute__((noinline)) void
+fn1 (void)
+{
+  int i;
+  #pragma omp simd
+  for (i = 0; i < N; i++)
+    a[i] = foo (c[i], a[i], b[i]) + 6;
+  #pragma omp simd
+  for (i = 0; i < N; i++)
+    c[i] = bar (a[i], b[i], c[i]) * 2;
+}
+
+__attribute__((noinline)) void
+fn2 (void)
+{
+  int i;
+  #pragma omp simd
+  for (i = 0; i < N; i++)
+    {
+      a[i] = foo (c[i], a[i], b[i]) + 6;
+      d[i]++;
+    }
+  #pragma omp simd
+  for (i = 0; i < N; i++)
+    {
+      c[i] = bar (a[i], b[i], c[i]) * 2;
+      d[i] /= 2;
+    }
+}
+
+__attribute__((noinline)) void
+fn3 (void)
+{
+  int i;
+  for (i = 0; i < N; i++)
+    {
+      a[i] = i * 2;
+      b[i] = 17 + (i % 37);
+      c[i] = (i & 63);
+      d[i] = 16 + i;
+    }
+}
+
+int
+main ()
+{
+  int i;
+  check_vect ();
+  fn3 ();
+  fn1 ();
+  for (i = 0; i < N; i++)
+    if (a[i] != i * 2 + 23 + (i % 37) + (i & 63)
+	|| b[i] != 17 + (i % 37)
+	|| c[i] != i * 4 + 80 + 4 * (i % 37) + 4 * (i & 63))
+      abort ();
+  fn3 ();
+  fn2 ();
+  for (i = 0; i < N; i++)
+    if (a[i] != i * 2 + 23 + (i % 37) + (i & 63)
+	|| b[i] != 17 + (i % 37)
+	|| c[i] != i * 4 + 80 + 4 * (i % 37) + 4 * (i & 63)
+	|| d[i] != ((unsigned char) (17 + i)) / 2)
+      abort ();
+  return 0;
+}
+
+/* { dg-final { cleanup-tree-dump "vect" } } */
--- gcc/testsuite/gcc.dg/vect/vect-simd-clone-9.c	(.../trunk)	(revision 0)
+++ gcc/testsuite/gcc.dg/vect/vect-simd-clone-9.c	(.../branches/gomp-4_0-branch)	(revision 205231)
@@ -0,0 +1,94 @@ 
+/* { dg-additional-options "-fopenmp-simd" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+#include "tree-vect.h"
+
+#ifndef N
+#define N 1024
+#endif
+
+int a[N], b[N];
+long int c[N];
+unsigned char d[N];
+
+#pragma omp declare simd notinbranch
+__attribute__((noinline)) static int
+foo (long int a, int b, int c)
+{
+  return a + b + c;
+}
+
+#pragma omp declare simd notinbranch
+__attribute__((noinline)) static long int
+bar (int a, int b, long int c)
+{
+  return a + b + c;
+}
+
+__attribute__((noinline)) void
+fn1 (void)
+{
+  int i;
+  #pragma omp simd
+  for (i = 0; i < N; i++)
+    a[i] = foo (c[i], a[i], b[i]) + 6;
+  #pragma omp simd
+  for (i = 0; i < N; i++)
+    c[i] = bar (a[i], b[i], c[i]) * 2;
+}
+
+__attribute__((noinline)) void
+fn2 (void)
+{
+  int i;
+  #pragma omp simd
+  for (i = 0; i < N; i++)
+    {
+      a[i] = foo (c[i], a[i], b[i]) + 6;
+      d[i]++;
+    }
+  #pragma omp simd
+  for (i = 0; i < N; i++)
+    {
+      c[i] = bar (a[i], b[i], c[i]) * 2;
+      d[i] /= 2;
+    }
+}
+
+__attribute__((noinline)) void
+fn3 (void)
+{
+  int i;
+  for (i = 0; i < N; i++)
+    {
+      a[i] = i * 2;
+      b[i] = 17 + (i % 37);
+      c[i] = (i & 63);
+      d[i] = 16 + i;
+    }
+}
+
+int
+main ()
+{
+  int i;
+  check_vect ();
+  fn3 ();
+  fn1 ();
+  for (i = 0; i < N; i++)
+    if (a[i] != i * 2 + 23 + (i % 37) + (i & 63)
+	|| b[i] != 17 + (i % 37)
+	|| c[i] != i * 4 + 80 + 4 * (i % 37) + 4 * (i & 63))
+      abort ();
+  fn3 ();
+  fn2 ();
+  for (i = 0; i < N; i++)
+    if (a[i] != i * 2 + 23 + (i % 37) + (i & 63)
+	|| b[i] != 17 + (i % 37)
+	|| c[i] != i * 4 + 80 + 4 * (i % 37) + 4 * (i & 63)
+	|| d[i] != ((unsigned char) (17 + i)) / 2)
+      abort ();
+  return 0;
+}
+
+/* { dg-final { cleanup-tree-dump "vect" } } */
--- gcc/testsuite/g++.dg/gomp/declare-simd-1.C	(.../trunk)	(revision 205223)
+++ gcc/testsuite/g++.dg/gomp/declare-simd-1.C	(.../branches/gomp-4_0-branch)	(revision 205231)
@@ -239,5 +239,5 @@  struct D
 void
 f38 (D &d)
 {
-  d.f37 <12> (6);
+  d.f37 <16> (6);
 }