Patchwork [gomp4] SIMD clauses patch committed

login
register
mail settings
Submitter Jakub Jelinek
Date June 28, 2013, 4:01 p.m.
Message ID <20130628160147.GO2336@tucnak.redhat.com>
Download mbox | patch
Permalink /patch/255447/
State New
Headers show

Comments

Jakub Jelinek - June 28, 2013, 4:01 p.m.
Hi!

The patch grew already too large, while doing something clever to avoid
unnecessary memory loads/stores in vectorized loop is still unimplemented,
the rest is and it seems to work now (at least on the added testcases),
so I've committed it to gomp-4_0-branch now.

If anyone has any issues with this approach, please let me know.

2013-06-28  Jakub Jelinek  <jakub@redhat.com>
	    Aldy Hernandez  <aldyh@redhat.com>

	* internal-fn.def (GOMP_SIMD_LANE, GOMP_SIMD_VF,
	GOMP_SIMD_LAST_LANE): New internal functions.
	* omp-low.c (omp_max_vf, lower_rec_simd_input_clauses): New
	functions.
	(lower_rec_input_clauses): Add fd argument.  Enforce max_vf = 1
	if any data sharing clauses mention VLAs or for array reductions.
	Handle OMP_CLAUSE__LOOPTEMP_ clause.  For
	OMP_CLAUSE_{{FIRST,LAST,}PRIVATE,LINEAR,REDUCTION} on SIMD
	constructs use "omp simd array" temporaries.  For OMP_CLAUSE_LINEAR
	adjust initial value in combined constructs.  Don't emit any
	barriers for #pragma omp distribute.  If max_vf is lower than
	current safelen, prepend an OMP_CLAUSE_SAFELEN clause.
	(lower_lastprivate_clauses): Handle "omp simd array" temporaries.
	(lower_reduction_clauses): Exit early for #pragma omp simd.
	(expand_omp_simd): Set loop->simduid from OMP_CLAUSE__SIMDUID_
	and cfun->has_simduid_loops if set.
	If OMP_CLAUSE_SAFELEN (1) is present, don't set loop->safelen
	nor loop->force_vect.
	(lower_omp_sections, lower_omp_single, lower_omp_taskreg): Adjust
	lower_rec_input_clauses callers.
	(lower_omp_for_lastprivate): Unshare vinit.
	(lower_omp_for): Add OMP_CLAUSE__LOOPTEMP_ clauses before calling
	lower_rec_input_clauses.  Adjust lower_rec_input_clauses caller.
	Always call lower_omp_for_lastprivate at the same place, even for
	#pragma omp simd.
	* tree.h (enum clause_code): Add OMP_CLAUSE__SIMDUID_.
	(OMP_CLAUSE__SIMDUID__DECL): Define.
	* tree-vectorizer.c: Include hash-table.h and tree-ssa-propagate.h.
	(simduid_to_vf, decl_to_simduid): New classes.
	(simduid_to_vf::hash, simduid_to_vf::equal, decl_to_simduid::hash,
	decl_to_simduid::equal): New methods.
	(note_simd_array_uses_struct): New struct.
	(adjust_simduid_builtins, note_simd_array_uses_cb,
	note_simd_array_uses): New functions.
	(vectorize_loops): Adjust "omp simd array" temporary array sizes
	and fold GOMP_SIMD_{LANE,VF,LAST_LANE} builtins.
	* tree-vectorizer.h (struct _stmt_vec_info): Add simd_lane_access_p
	field.
	(STMT_VINFO_SIMD_LANE_ACCESS_P): Define.
	* tree-data-ref.c (get_references_in_stmt): Allow GOMP_SIMD_LANE
	builtins in their own loops.
	* tree-inline.c (copy_cfg_body): Propagate has_force_vect_loops
	and has_simduid_loops.
	* function.h (struct function): Add has_simduid_loops field.
	* tree-ssa-ccp.c (likely_value): For GOMP_SIMD_{LANE,LAST_LANE,VF}
	builtins ignore the undefined magic argument.
	* tree-pretty-print.c (dump_omp_clause): Handle OMP_CLAUSE__SIMDUID_
	clause.
	* cfgloop.h (struct loop): Add simduid field.
	* Makefile.in (tree-vectorizer.o): Depend on $(HASH_TABLE_H)
	and tree-ssa-propagate.h.
	* tree-vect-data-refs.c (vect_analyze_data_refs): Check for SIMD
	lane access.
	* gimplify.c (omp_add_variable): Handle combination of aligned
	clause and some data sharing clause for the same decl.
	(gimplify_omp_for): For collapse (2) and above simd loops
	predetermine loop iteration vars as lastprivate instead of
	linear.
	* tree.c (omp_clause_num_ops, omp_clause_code_name): Add
	entries for OMP_CLAUSE__SIMDUID_.
	(walk_tree_1): Handle OMP_CLAUSE__SIMDUID_.
	* tree-vect-loop.c (vectorizable_live_operation): Handle live
	GOMP_SIMD_LANE result.
	* tree-vect-stmts.c (vectorizable_call): Vectorize GOMP_SIMD_LANE
	builtin.
	(vectorizable_store, vectorizable_load): Handle
	STMT_VINFO_SIMD_LANE_ACCESS_P.
	* internal-fn.c (expand_GOMP_SIMD_LANE, expand_GOMP_SIMD_VF,
	expand_GOMP_SIMD_LAST_LANE): New functions.

	* testsuite/libgomp.c++/simd-1.C: New test.
	* testsuite/libgomp.c++/simd-2.C: New test.
	* testsuite/libgomp.c++/simd-3.C: New test.


	Jakub
Aldy Hernandez - June 28, 2013, 5:36 p.m.
On 06/28/13 09:01, Jakub Jelinek wrote:
> Hi!
>
> The patch grew already too large, while doing something clever to avoid
> unnecessary memory loads/stores in vectorized loop is still unimplemented,
> the rest is and it seems to work now (at least on the added testcases),
> so I've committed it to gomp-4_0-branch now.
>
> If anyone has any issues with this approach, please let me know.
>

I'm cool with it.  Thanks for working on this.

I merged my aldyh/cilk-in-gomp branch with your current gomp-4_0-branch, 
and everything's peachy.  I also noticed you merged with trunk, so now 
we're all squared with Balaji's array notation work.  There was some 
small fallout, but I've fixed it all.

At this point, I wonder if it would be reasonable to merge my branch 
into gomp-4_0-branch, or if you'd rather keep it separate.

Also, are we at a point where we could perhaps merge Cilk Plus' <#pragma 
simd> into trunk (albeit, bringing in the corresponding machinery from 
gomp-4_0-branch first)?  The Cilk Plus work is incremental, so it's 
understandable to bring this in piecemeal (array notation, then pragma 
simd, then elemental functions, etc etc).

What are your thoughts?

Patch

--- gcc/internal-fn.def.jj	2013-06-26 12:09:40.067531623 +0200
+++ gcc/internal-fn.def	2013-06-26 13:20:56.873808820 +0200
@@ -40,3 +40,6 @@  along with GCC; see the file COPYING3.
 
 DEF_INTERNAL_FN (LOAD_LANES, ECF_CONST | ECF_LEAF)
 DEF_INTERNAL_FN (STORE_LANES, ECF_CONST | ECF_LEAF)
+DEF_INTERNAL_FN (GOMP_SIMD_LANE, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW)
+DEF_INTERNAL_FN (GOMP_SIMD_VF, ECF_CONST | ECF_LEAF | ECF_NOTHROW)
+DEF_INTERNAL_FN (GOMP_SIMD_LAST_LANE, ECF_CONST | ECF_LEAF | ECF_NOTHROW)
--- gcc/omp-low.c.jj	2013-06-26 12:13:59.533205674 +0200
+++ gcc/omp-low.c	2013-06-26 19:10:29.523028867 +0200
@@ -2508,6 +2508,73 @@  omp_clause_aligned_alignment (tree claus
   return build_int_cst (integer_type_node, al);
 }
 
+/* Return maximum possible vectorization factor for the target.  */
+
+static int
+omp_max_vf (void)
+{
+  if (!optimize
+      || optimize_debug
+      || (!flag_tree_vectorize
+	  && global_options_set.x_flag_tree_vectorize))
+    return 1;
+
+  int vs = targetm.vectorize.autovectorize_vector_sizes ();
+  if (vs)
+    {
+      vs = 1 << floor_log2 (vs);
+      return vs;
+    }
+  enum machine_mode vqimode = targetm.vectorize.preferred_simd_mode (QImode);
+  if (GET_MODE_CLASS (vqimode) == MODE_VECTOR_INT)
+    return GET_MODE_NUNITS (vqimode);
+  return 1;
+}
+
+/* Helper function of lower_rec_input_clauses, used for #pragma omp simd
+   privatization.  */
+
+static bool
+lower_rec_simd_input_clauses (tree new_var, omp_context *ctx, int &max_vf,
+			      tree &idx, tree &lane, tree &ivar, tree &lvar)
+{
+  if (max_vf == 0)
+    {
+      max_vf = omp_max_vf ();
+      if (max_vf > 1)
+	{
+	  tree c = find_omp_clause (gimple_omp_for_clauses (ctx->stmt),
+				    OMP_CLAUSE_SAFELEN);
+	  if (c
+	      && compare_tree_int (OMP_CLAUSE_SAFELEN_EXPR (c), max_vf) == -1)
+	    max_vf = tree_low_cst (OMP_CLAUSE_SAFELEN_EXPR (c), 0);
+	}
+      if (max_vf > 1)
+	{
+	  idx = create_tmp_var (unsigned_type_node, NULL);
+	  lane = create_tmp_var (unsigned_type_node, NULL);
+	}
+    }
+  if (max_vf == 1)
+    return false;
+
+  tree atype = build_array_type_nelts (TREE_TYPE (new_var), max_vf);
+  tree avar = create_tmp_var_raw (atype, NULL);
+  if (TREE_ADDRESSABLE (new_var))
+    TREE_ADDRESSABLE (avar) = 1;
+  DECL_ATTRIBUTES (avar)
+    = tree_cons (get_identifier ("omp simd array"), NULL,
+		 DECL_ATTRIBUTES (avar));
+  gimple_add_tmp_var (avar);
+  ivar = build4 (ARRAY_REF, TREE_TYPE (new_var), avar, idx,
+		 NULL_TREE, NULL_TREE);
+  lvar = build4 (ARRAY_REF, TREE_TYPE (new_var), avar, lane,
+		 NULL_TREE, NULL_TREE);
+  SET_DECL_VALUE_EXPR (new_var, lvar);
+  DECL_HAS_VALUE_EXPR_P (new_var) = 1;
+  return true;
+}
+
 /* Generate code to implement the input clauses, FIRSTPRIVATE and COPYIN,
    from the receiver (aka child) side and initializers for REFERENCE_TYPE
    private variables.  Initialization statements go in ILIST, while calls
@@ -2515,15 +2582,43 @@  omp_clause_aligned_alignment (tree claus
 
 static void
 lower_rec_input_clauses (tree clauses, gimple_seq *ilist, gimple_seq *dlist,
-			 omp_context *ctx)
+			 omp_context *ctx, struct omp_for_data *fd)
 {
   tree c, dtor, copyin_seq, x, ptr;
   bool copyin_by_ref = false;
   bool lastprivate_firstprivate = false;
   int pass;
+  bool is_simd = (gimple_code (ctx->stmt) == GIMPLE_OMP_FOR
+		  && gimple_omp_for_kind (ctx->stmt) == GF_OMP_FOR_KIND_SIMD);
+  int max_vf = 0;
+  tree lane = NULL_TREE, idx = NULL_TREE;
+  tree ivar = NULL_TREE, lvar = NULL_TREE;
+  gimple_seq llist[2] = { NULL, NULL };
 
   copyin_seq = NULL;
 
+  /* Enforce simdlen 1 in simd loops with data sharing clauses referencing
+     variable sized vars.  That is unnecessarily hard to support and very
+     unlikely to result in vectorized code anyway.  */
+  if (is_simd)
+    for (c = clauses; c ; c = OMP_CLAUSE_CHAIN (c))
+      switch (OMP_CLAUSE_CODE (c))
+	{
+	case OMP_CLAUSE_REDUCTION:
+	  if (OMP_CLAUSE_REDUCTION_PLACEHOLDER (c))
+	    max_vf = 1;
+	  /* FALLTHRU */
+	case OMP_CLAUSE_PRIVATE:
+	case OMP_CLAUSE_FIRSTPRIVATE:
+	case OMP_CLAUSE_LASTPRIVATE:
+	case OMP_CLAUSE_LINEAR:
+	  if (is_variable_sized (OMP_CLAUSE_DECL (c)))
+	    max_vf = 1;
+	  break;
+	default:
+	  continue;
+	}
+
   /* Do all the fixed sized types in the first pass, and the variable sized
      types in the second pass.  This makes sure that the scalar arguments to
      the variable sized types are processed before we use them in the
@@ -2553,7 +2648,11 @@  lower_rec_input_clauses (tree clauses, g
 	    case OMP_CLAUSE_COPYIN:
 	    case OMP_CLAUSE_REDUCTION:
 	    case OMP_CLAUSE_LINEAR:
+	      break;
 	    case OMP_CLAUSE__LOOPTEMP_:
+	      /* Handle _looptemp_ clauses only on parallel.  */
+	      if (fd)
+		continue;
 	      break;
 	    case OMP_CLAUSE_LASTPRIVATE:
 	      if (OMP_CLAUSE_LASTPRIVATE_FIRSTPRIVATE (c))
@@ -2737,6 +2836,34 @@  lower_rec_input_clauses (tree clauses, g
 		x = NULL;
 	    do_private:
 	      x = lang_hooks.decls.omp_clause_default_ctor (c, new_var, x);
+	      if (is_simd)
+		{
+		  tree y = lang_hooks.decls.omp_clause_dtor (c, new_var);
+		  if ((TREE_ADDRESSABLE (new_var) || x || y
+		       || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_LASTPRIVATE)
+		      && lower_rec_simd_input_clauses (new_var, ctx, max_vf,
+						       idx, lane, ivar, lvar))
+		    {
+		      if (x)
+			x = lang_hooks.decls.omp_clause_default_ctor
+						(c, unshare_expr (ivar), x);
+		      if (x)
+			gimplify_and_add (x, &llist[0]);
+		      if (y)
+			{
+			  y = lang_hooks.decls.omp_clause_dtor (c, ivar);
+			  if (y)
+			    {
+			      gimple_seq tseq = NULL;
+
+			      dtor = y;
+			      gimplify_stmt (&dtor, &tseq);
+			      gimple_seq_add_seq (&llist[1], tseq);
+			    }
+			}
+		      break;
+		    }
+		}
 	      if (x)
 		gimplify_and_add (x, ilist);
 	      /* FALLTHRU */
@@ -2779,10 +2906,92 @@  lower_rec_input_clauses (tree clauses, g
 		}
 	    do_firstprivate:
 	      x = build_outer_var_ref (var, ctx);
+	      if (is_simd)
+		{
+		  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_LINEAR
+		      && gimple_omp_for_combined_into_p (ctx->stmt))
+		    {
+		      tree stept = POINTER_TYPE_P (TREE_TYPE (x))
+				   ? sizetype : TREE_TYPE (x);
+		      tree t = fold_convert (stept,
+					     OMP_CLAUSE_LINEAR_STEP (c));
+		      tree c = find_omp_clause (clauses,
+						OMP_CLAUSE__LOOPTEMP_);
+		      gcc_assert (c);
+		      tree l = OMP_CLAUSE_DECL (c);
+		      if (fd->collapse == 1)
+			{
+			  tree n1 = fd->loop.n1;
+			  tree step = fd->loop.step;
+			  tree itype = TREE_TYPE (l);
+			  if (POINTER_TYPE_P (itype))
+			    itype = signed_type_for (itype);
+			  l = fold_build2 (MINUS_EXPR, itype, l, n1);
+			  if (TYPE_UNSIGNED (itype)
+			      && fd->loop.cond_code == GT_EXPR)
+			    l = fold_build2 (TRUNC_DIV_EXPR, itype,
+					     fold_build1 (NEGATE_EXPR,
+							  itype, l),
+					     fold_build1 (NEGATE_EXPR,
+							  itype, step));
+			  else
+			    l = fold_build2 (TRUNC_DIV_EXPR, itype, l, step);
+			}
+		      t = fold_build2 (MULT_EXPR, stept,
+				       fold_convert (stept, l), t);
+		      if (POINTER_TYPE_P (TREE_TYPE (x)))
+			x = fold_build2 (POINTER_PLUS_EXPR,
+					 TREE_TYPE (x), x, t);
+		      else
+			x = fold_build2 (PLUS_EXPR, TREE_TYPE (x), x, t);
+		    }
+
+		  if ((OMP_CLAUSE_CODE (c) != OMP_CLAUSE_LINEAR
+		       || TREE_ADDRESSABLE (new_var))
+		      && lower_rec_simd_input_clauses (new_var, ctx, max_vf,
+						       idx, lane, ivar, lvar))
+		    {
+		      if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_LINEAR)
+			{
+			  tree iv = create_tmp_var (TREE_TYPE (new_var), NULL);
+			  x = lang_hooks.decls.omp_clause_copy_ctor (c, iv, x);
+			  gimplify_and_add (x, ilist);
+			  gimple_stmt_iterator gsi
+			    = gsi_start_1 (gimple_omp_body_ptr (ctx->stmt));
+			  gimple g
+			    = gimple_build_assign (unshare_expr (lvar), iv);
+			  gsi_insert_before_without_update (&gsi, g,
+							    GSI_SAME_STMT);
+			  tree stept = POINTER_TYPE_P (TREE_TYPE (x))
+				       ? sizetype : TREE_TYPE (x);
+			  tree t = fold_convert (stept,
+						 OMP_CLAUSE_LINEAR_STEP (c));
+			  enum tree_code code = PLUS_EXPR;
+			  if (POINTER_TYPE_P (TREE_TYPE (new_var)))
+			    code = POINTER_PLUS_EXPR;
+			  g = gimple_build_assign_with_ops (code, iv, iv, t);
+			  gsi_insert_before_without_update (&gsi, g,
+							    GSI_SAME_STMT);
+			  break;
+			}
+		      x = lang_hooks.decls.omp_clause_copy_ctor
+						(c, unshare_expr (ivar), x);
+		      gimplify_and_add (x, &llist[0]);
+		      x = lang_hooks.decls.omp_clause_dtor (c, ivar);
+		      if (x)
+			{
+			  gimple_seq tseq = NULL;
+
+			  dtor = x;
+			  gimplify_stmt (&dtor, &tseq);
+			  gimple_seq_add_seq (&llist[1], tseq);
+			}
+		      break;
+		    }
+		}
 	      x = lang_hooks.decls.omp_clause_copy_ctor (c, new_var, x);
 	      gimplify_and_add (x, ilist);
 	      goto do_dtor;
-	      break;
 
 	    case OMP_CLAUSE__LOOPTEMP_:
 	      gcc_assert (is_parallel_ctx (ctx));
@@ -2805,6 +3014,8 @@  lower_rec_input_clauses (tree clauses, g
 		  tree placeholder = OMP_CLAUSE_REDUCTION_PLACEHOLDER (c);
 		  x = build_outer_var_ref (var, ctx);
 
+		  /* FIXME: Not handled yet.  */
+		  gcc_assert (!is_simd);
 		  if (is_reference (var))
 		    x = build_fold_addr_expr_loc (clause_loc, x);
 		  SET_DECL_VALUE_EXPR (placeholder, x);
@@ -2819,7 +3030,31 @@  lower_rec_input_clauses (tree clauses, g
 		{
 		  x = omp_reduction_init (c, TREE_TYPE (new_var));
 		  gcc_assert (TREE_CODE (TREE_TYPE (new_var)) != ARRAY_TYPE);
-		  gimplify_assign (new_var, x, ilist);
+		  if (is_simd
+		      && lower_rec_simd_input_clauses (new_var, ctx, max_vf,
+						       idx, lane, ivar, lvar))
+		    {
+		      enum tree_code code = OMP_CLAUSE_REDUCTION_CODE (c);
+		      tree ref = build_outer_var_ref (var, ctx);
+
+		      gimplify_assign (unshare_expr (ivar), x, &llist[0]);
+
+		      /* reduction(-:var) sums up the partial results, so it
+			 acts identically to reduction(+:var).  */
+		      if (code == MINUS_EXPR)
+			code = PLUS_EXPR;
+
+		      x = build2 (code, TREE_TYPE (ref), ref, ivar);
+		      ref = build_outer_var_ref (var, ctx);
+		      gimplify_assign (ref, x, &llist[1]);
+		    }
+		  else
+		    {
+		      gimplify_assign (new_var, x, ilist);
+		      if (is_simd)
+			gimplify_assign (build_outer_var_ref (var, ctx),
+					 new_var, dlist);
+		    }
 		}
 	      break;
 
@@ -2829,6 +3064,49 @@  lower_rec_input_clauses (tree clauses, g
 	}
     }
 
+  if (lane)
+    {
+      tree uid = create_tmp_var (ptr_type_node, "simduid");
+      gimple g
+	= gimple_build_call_internal (IFN_GOMP_SIMD_LANE, 1, uid);
+      gimple_call_set_lhs (g, lane);
+      gimple_stmt_iterator gsi = gsi_start_1 (gimple_omp_body_ptr (ctx->stmt));
+      gsi_insert_before_without_update (&gsi, g, GSI_SAME_STMT);
+      c = build_omp_clause (UNKNOWN_LOCATION, OMP_CLAUSE__SIMDUID_);
+      OMP_CLAUSE__SIMDUID__DECL (c) = uid;
+      OMP_CLAUSE_CHAIN (c) = gimple_omp_for_clauses (ctx->stmt);
+      gimple_omp_for_set_clauses (ctx->stmt, c);
+      g = gimple_build_assign_with_ops (INTEGER_CST, lane,
+					build_int_cst (unsigned_type_node, 0),
+					NULL_TREE);
+      gimple_seq_add_stmt (ilist, g);
+      for (int i = 0; i < 2; i++)
+	if (llist[i])
+	  {
+	    tree vf = create_tmp_var (unsigned_type_node, NULL);
+	    g = gimple_build_call_internal (IFN_GOMP_SIMD_VF, 1, uid);
+	    gimple_call_set_lhs (g, vf);
+	    gimple_seq *seq = i == 0 ? ilist : dlist;
+	    gimple_seq_add_stmt (seq, g);
+	    tree t = build_int_cst (unsigned_type_node, 0);
+	    g = gimple_build_assign_with_ops (INTEGER_CST, idx, t, NULL_TREE);
+	    gimple_seq_add_stmt (seq, g);
+	    tree body = create_artificial_label (UNKNOWN_LOCATION);
+	    tree header = create_artificial_label (UNKNOWN_LOCATION);
+	    tree end = create_artificial_label (UNKNOWN_LOCATION);
+	    gimple_seq_add_stmt (seq, gimple_build_goto (header));
+	    gimple_seq_add_stmt (seq, gimple_build_label (body));
+	    gimple_seq_add_seq (seq, llist[i]);
+	    t = build_int_cst (unsigned_type_node, 1);
+	    g = gimple_build_assign_with_ops (PLUS_EXPR, idx, idx, t);
+	    gimple_seq_add_stmt (seq, g);
+	    gimple_seq_add_stmt (seq, gimple_build_label (header));
+	    g = gimple_build_cond (LT_EXPR, idx, vf, body, end);
+	    gimple_seq_add_stmt (seq, g);
+	    gimple_seq_add_stmt (seq, gimple_build_label (end));
+	  }
+    }
+
   /* The copyin sequence is not to be executed by the main thread, since
      that would result in self-copies.  Perhaps not visible to scalars,
      but it certainly is to C++ operator=.  */
@@ -2849,11 +3127,30 @@  lower_rec_input_clauses (tree clauses, g
      happens after firstprivate copying in all threads.  */
   if (copyin_by_ref || lastprivate_firstprivate)
     {
-      /* Don't add any barrier for #pragma omp simd.  */
+      /* Don't add any barrier for #pragma omp simd or
+	 #pragma omp distribute.  */
       if (gimple_code (ctx->stmt) != GIMPLE_OMP_FOR
-	  || gimple_omp_for_kind (ctx->stmt) != GF_OMP_FOR_KIND_SIMD)
+	  || gimple_omp_for_kind (ctx->stmt) == GF_OMP_FOR_KIND_FOR)
 	gimplify_and_add (build_omp_barrier (), ilist);
     }
+
+  /* If max_vf is non-NULL, then we can use only vectorization factor
+     up to the max_vf we chose.  So stick it into safelen clause.  */
+  if (max_vf)
+    {
+      tree c = find_omp_clause (gimple_omp_for_clauses (ctx->stmt),
+				OMP_CLAUSE_SAFELEN);
+      if (c == NULL_TREE
+	  || compare_tree_int (OMP_CLAUSE_SAFELEN_EXPR (c),
+			       max_vf) == 1)
+	{
+	  c = build_omp_clause (UNKNOWN_LOCATION, OMP_CLAUSE_SAFELEN);
+	  OMP_CLAUSE_SAFELEN_EXPR (c) = build_int_cst (integer_type_node,
+						       max_vf);
+	  OMP_CLAUSE_CHAIN (c) = gimple_omp_for_clauses (ctx->stmt);
+	  gimple_omp_for_set_clauses (ctx->stmt, c);
+	}
+    }
 }
 
 
@@ -2865,8 +3162,9 @@  static void
 lower_lastprivate_clauses (tree clauses, tree predicate, gimple_seq *stmt_list,
 			   omp_context *ctx)
 {
-  tree x, c, label = NULL;
+  tree x, c, label = NULL, orig_clauses = clauses;
   bool par_clauses = false;
+  tree simduid = NULL, lastlane = NULL;
 
   /* Early exit if there are no lastprivate or linear clauses.  */
   for (; clauses ; clauses = OMP_CLAUSE_CHAIN (clauses))
@@ -2910,6 +3208,14 @@  lower_lastprivate_clauses (tree clauses,
       gimple_seq_add_stmt (stmt_list, gimple_build_label (label_true));
     }
 
+  if (gimple_code (ctx->stmt) == GIMPLE_OMP_FOR
+      && gimple_omp_for_kind (ctx->stmt) == GF_OMP_FOR_KIND_SIMD)
+    {
+      simduid = find_omp_clause (orig_clauses, OMP_CLAUSE__SIMDUID_);
+      if (simduid)
+	simduid = OMP_CLAUSE__SIMDUID__DECL (simduid);
+    }
+
   for (c = clauses; c ;)
     {
       tree var, new_var;
@@ -2922,6 +3228,31 @@  lower_lastprivate_clauses (tree clauses,
 	  var = OMP_CLAUSE_DECL (c);
 	  new_var = lookup_decl (var, ctx);
 
+	  if (simduid && DECL_HAS_VALUE_EXPR_P (new_var))
+	    {
+	      tree val = DECL_VALUE_EXPR (new_var);
+	      if (TREE_CODE (val) == ARRAY_REF
+		  && VAR_P (TREE_OPERAND (val, 0))
+		  && lookup_attribute ("omp simd array",
+				       DECL_ATTRIBUTES (TREE_OPERAND (val,
+								      0))))
+		{
+		  if (lastlane == NULL)
+		    {
+		      lastlane = create_tmp_var (unsigned_type_node, NULL);
+		      gimple g
+			= gimple_build_call_internal (IFN_GOMP_SIMD_LAST_LANE,
+						      2, simduid,
+						      TREE_OPERAND (val, 1));
+		      gimple_call_set_lhs (g, lastlane);
+		      gimple_seq_add_stmt (stmt_list, g);
+		    }
+		  new_var = build4 (ARRAY_REF, TREE_TYPE (val),
+				    TREE_OPERAND (val, 0), lastlane,
+				    NULL_TREE, NULL_TREE);
+		}
+	    }
+
 	  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_LASTPRIVATE
 	      && OMP_CLAUSE_LASTPRIVATE_GIMPLE_SEQ (c))
 	    {
@@ -2971,6 +3302,11 @@  lower_reduction_clauses (tree clauses, g
   tree x, c;
   int count = 0;
 
+  /* SIMD reductions are handled in lower_rec_input_clauses.  */
+  if (gimple_code (ctx->stmt) == GIMPLE_OMP_FOR
+      && gimple_omp_for_kind (ctx->stmt) == GF_OMP_FOR_KIND_SIMD)
+    return;
+
   /* First see if there is exactly one reduction clause.  Use OMP_ATOMIC
      update in that case, otherwise use a lock.  */
   for (c = clauses; c && count < 2; c = OMP_CLAUSE_CHAIN (c))
@@ -5691,6 +6027,8 @@  expand_omp_simd (struct omp_region *regi
   int i;
   tree safelen = find_omp_clause (gimple_omp_for_clauses (fd->for_stmt),
 				  OMP_CLAUSE_SAFELEN);
+  tree simduid = find_omp_clause (gimple_omp_for_clauses (fd->for_stmt),
+				  OMP_CLAUSE__SIMDUID_);
   tree n1, n2;
 
   type = TREE_TYPE (fd->loop.v);
@@ -5902,11 +6240,19 @@  expand_omp_simd (struct omp_region *regi
 	    loop->safelen = INT_MAX;
 	  else
 	    loop->safelen = tree_low_cst (safelen, 1);
+	  if (loop->safelen == 1)
+	    loop->safelen = 0;
+	}
+      if (simduid)
+	{
+	  loop->simduid = OMP_CLAUSE__SIMDUID__DECL (simduid);
+	  cfun->has_simduid_loops = true;
 	}
       /* If not -fno-tree-vectorize, hint that we want to vectorize
 	 the loop.  */
-      if (flag_tree_vectorize
-	  || !global_options_set.x_flag_tree_vectorize)
+      if ((flag_tree_vectorize
+	   || !global_options_set.x_flag_tree_vectorize)
+	  && loop->safelen > 1)
 	{
 	  loop->force_vect = true;
 	  cfun->has_force_vect_loops = true;
@@ -7107,7 +7453,7 @@  lower_omp_sections (gimple_stmt_iterator
   dlist = NULL;
   ilist = NULL;
   lower_rec_input_clauses (gimple_omp_sections_clauses (stmt),
-      			   &ilist, &dlist, ctx);
+      			   &ilist, &dlist, ctx, NULL);
 
   new_body = gimple_omp_body (stmt);
   gimple_omp_set_body (stmt, NULL);
@@ -7315,7 +7661,7 @@  lower_omp_single (gimple_stmt_iterator *
   bind_body = NULL;
   dlist = NULL;
   lower_rec_input_clauses (gimple_omp_single_clauses (single_stmt),
-			   &bind_body, &dlist, ctx);
+			   &bind_body, &dlist, ctx, NULL);
   lower_omp (gimple_omp_body_ptr (single_stmt), ctx);
 
   gimple_seq_add_stmt (&bind_body, single_stmt);
@@ -7564,6 +7910,8 @@  lower_omp_for_lastprivate (struct omp_fo
 	  && host_integerp (fd->loop.n2, 0)
 	  && ! integer_zerop (fd->loop.n2))
 	vinit = build_int_cst (TREE_TYPE (fd->loop.v), 0);
+      else
+	vinit = unshare_expr (vinit);
 
       /* Initialize the iterator variable, so that threads that don't execute
 	 any iterations don't execute the lastprivate clauses by accident.  */
@@ -7578,7 +7926,7 @@  static void
 lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 {
   tree *rhs_p, block;
-  struct omp_for_data fd;
+  struct omp_for_data fd, *fdp = NULL;
   gimple stmt = gsi_stmt (*gsi_p), new_stmt;
   gimple_seq omp_for_body, body, dlist;
   size_t i;
@@ -7605,41 +7953,11 @@  lower_omp_for (gimple_stmt_iterator *gsi
       gimple_bind_append_vars (new_stmt, vars);
     }
 
-  /* The pre-body and input clauses go before the lowered GIMPLE_OMP_FOR.  */
-  dlist = NULL;
-  body = NULL;
-  lower_rec_input_clauses (gimple_omp_for_clauses (stmt), &body, &dlist, ctx);
-  gimple_seq_add_seq (&body, gimple_omp_for_pre_body (stmt));
-
-  lower_omp (gimple_omp_body_ptr (stmt), ctx);
-
-  /* Lower the header expressions.  At this point, we can assume that
-     the header is of the form:
-
-     	#pragma omp for (V = VAL1; V {<|>|<=|>=} VAL2; V = V [+-] VAL3)
-
-     We just need to make sure that VAL1, VAL2 and VAL3 are lowered
-     using the .omp_data_s mapping, if needed.  */
-  for (i = 0; i < gimple_omp_for_collapse (stmt); i++)
-    {
-      rhs_p = gimple_omp_for_initial_ptr (stmt, i);
-      if (!is_gimple_min_invariant (*rhs_p))
-	*rhs_p = get_formal_tmp_var (*rhs_p, &body);
-
-      rhs_p = gimple_omp_for_final_ptr (stmt, i);
-      if (!is_gimple_min_invariant (*rhs_p))
-	*rhs_p = get_formal_tmp_var (*rhs_p, &body);
-
-      rhs_p = &TREE_OPERAND (gimple_omp_for_incr (stmt, i), 1);
-      if (!is_gimple_min_invariant (*rhs_p))
-	*rhs_p = get_formal_tmp_var (*rhs_p, &body);
-    }
-
-  /* Once lowered, extract the bounds and clauses.  */
-  extract_omp_for_data (stmt, &fd, NULL);
-
   if (gimple_omp_for_combined_into_p (stmt))
     {
+      extract_omp_for_data (stmt, &fd, NULL);
+      fdp = &fd;
+
       /* We need two temporaries with fd.loop.v type (istart/iend)
 	 and then (fd.collapse - 1) temporaries with the same
 	 type for count2 ... countN-1 vars if not constant.  */
@@ -7674,8 +7992,41 @@  lower_omp_for (gimple_stmt_iterator *gsi
       *pc = clauses;
     }
 
-  if (gimple_omp_for_kind (fd.for_stmt) != GF_OMP_FOR_KIND_SIMD)
-    lower_omp_for_lastprivate (&fd, &body, &dlist, ctx);
+  /* The pre-body and input clauses go before the lowered GIMPLE_OMP_FOR.  */
+  dlist = NULL;
+  body = NULL;
+  lower_rec_input_clauses (gimple_omp_for_clauses (stmt), &body, &dlist, ctx,
+			   fdp);
+  gimple_seq_add_seq (&body, gimple_omp_for_pre_body (stmt));
+
+  lower_omp (gimple_omp_body_ptr (stmt), ctx);
+
+  /* Lower the header expressions.  At this point, we can assume that
+     the header is of the form:
+
+     	#pragma omp for (V = VAL1; V {<|>|<=|>=} VAL2; V = V [+-] VAL3)
+
+     We just need to make sure that VAL1, VAL2 and VAL3 are lowered
+     using the .omp_data_s mapping, if needed.  */
+  for (i = 0; i < gimple_omp_for_collapse (stmt); i++)
+    {
+      rhs_p = gimple_omp_for_initial_ptr (stmt, i);
+      if (!is_gimple_min_invariant (*rhs_p))
+	*rhs_p = get_formal_tmp_var (*rhs_p, &body);
+
+      rhs_p = gimple_omp_for_final_ptr (stmt, i);
+      if (!is_gimple_min_invariant (*rhs_p))
+	*rhs_p = get_formal_tmp_var (*rhs_p, &body);
+
+      rhs_p = &TREE_OPERAND (gimple_omp_for_incr (stmt, i), 1);
+      if (!is_gimple_min_invariant (*rhs_p))
+	*rhs_p = get_formal_tmp_var (*rhs_p, &body);
+    }
+
+  /* Once lowered, extract the bounds and clauses.  */
+  extract_omp_for_data (stmt, &fd, NULL);
+
+  lower_omp_for_lastprivate (&fd, &body, &dlist, ctx);
 
   gimple_seq_add_stmt (&body, stmt);
   gimple_seq_add_seq (&body, gimple_omp_body (stmt));
@@ -7685,20 +8036,13 @@  lower_omp_for (gimple_stmt_iterator *gsi
 
   /* After the loop, add exit clauses.  */
   lower_reduction_clauses (gimple_omp_for_clauses (stmt), &body, ctx);
+
   gimple_seq_add_seq (&body, dlist);
 
   body = maybe_catch_exception (body);
 
   /* Region exit marker goes at the end of the loop body.  */
   gimple_seq_add_stmt (&body, gimple_build_omp_return (fd.have_nowait));
-  if (gimple_omp_for_kind (fd.for_stmt) == GF_OMP_FOR_KIND_SIMD)
-    {
-      dlist = NULL;
-      lower_lastprivate_clauses (gimple_omp_for_clauses (fd.for_stmt),
-				 NULL_TREE, &dlist, ctx);
-      gimple_seq_add_seq (&body, dlist);
-    }
-
   pop_gimplify_context (new_stmt);
 
   gimple_bind_append_vars (new_stmt, ctx->block_vars);
@@ -8057,7 +8401,7 @@  lower_omp_taskreg (gimple_stmt_iterator
 
   par_olist = NULL;
   par_ilist = NULL;
-  lower_rec_input_clauses (clauses, &par_ilist, &par_olist, ctx);
+  lower_rec_input_clauses (clauses, &par_ilist, &par_olist, ctx, NULL);
   lower_omp (&par_body, ctx);
   if (gimple_code (stmt) == GIMPLE_OMP_PARALLEL)
     lower_reduction_clauses (clauses, &par_olist, ctx);
--- gcc/tree.h.jj	2013-06-26 12:13:54.382291138 +0200
+++ gcc/tree.h	2013-06-26 13:20:56.883807190 +0200
@@ -456,7 +456,10 @@  enum omp_clause_code
   OMP_CLAUSE_SECTIONS,
 
   /* OpenMP clause: taskgroup.  */
-  OMP_CLAUSE_TASKGROUP
+  OMP_CLAUSE_TASKGROUP,
+
+  /* Internally used only clause, holding SIMD uid.  */
+  OMP_CLAUSE__SIMDUID_
 };
 
 /* The definition of tree nodes fills the next several pages.  */
@@ -2001,6 +2004,9 @@  extern void protected_set_expr_location
 #define OMP_CLAUSE_SIMDLEN_EXPR(NODE) \
   OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_SIMDLEN), 0)
 
+#define OMP_CLAUSE__SIMDUID__DECL(NODE) \
+  OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__SIMDUID_), 0)
+
 enum omp_clause_schedule_kind
 {
   OMP_CLAUSE_SCHEDULE_STATIC,
--- gcc/tree-vectorizer.c.jj	2013-06-26 12:15:46.232281123 +0200
+++ gcc/tree-vectorizer.c	2013-06-26 13:20:56.884807036 +0200
@@ -66,13 +66,209 @@  along with GCC; see the file COPYING3.
 #include "cfgloop.h"
 #include "tree-vectorizer.h"
 #include "tree-pass.h"
+#include "hash-table.h"
+#include "tree-ssa-propagate.h"
 
 /* Loop or bb location.  */
 LOC vect_location;
 
 /* Vector mapping GIMPLE stmt to stmt_vec_info. */
 vec<vec_void_p> stmt_vec_info_vec;
+
+/* For mapping simduid to vectorization factor.  */
+
+struct simduid_to_vf : typed_free_remove<simduid_to_vf>
+{
+  unsigned int simduid;
+  int vf;
+
+  /* hash_table support.  */
+  typedef simduid_to_vf value_type;
+  typedef simduid_to_vf compare_type;
+  static inline hashval_t hash (const value_type *);
+  static inline int equal (const value_type *, const compare_type *);
+};
+
+inline hashval_t
+simduid_to_vf::hash (const value_type *p)
+{
+  return p->simduid;
+}
+
+inline int
+simduid_to_vf::equal (const value_type *p1, const value_type *p2)
+{
+  return p1->simduid == p2->simduid;
+}
+
+/* For mapping decl to simduid.  */
+
+struct decl_to_simduid : typed_free_remove<decl_to_simduid>
+{
+  tree decl;
+  unsigned int simduid;
+
+  /* hash_table support.  */
+  typedef decl_to_simduid value_type;
+  typedef decl_to_simduid compare_type;
+  static inline hashval_t hash (const value_type *);
+  static inline int equal (const value_type *, const compare_type *);
+};
+
+inline hashval_t
+decl_to_simduid::hash (const value_type *p)
+{
+  return DECL_UID (p->decl);
+}
+
+inline int
+decl_to_simduid::equal (const value_type *p1, const value_type *p2)
+{
+  return p1->decl == p2->decl;
+}
+
+/* Fold IFN_GOMP_SIMD_LANE, IFN_GOMP_SIMD_VF and IFN_GOMP_SIMD_LAST_LANE
+   into their corresponding constants.  */
+
+static void
+adjust_simduid_builtins (hash_table <simduid_to_vf> &htab)
+{
+  basic_block bb;
+
+  FOR_EACH_BB (bb)
+    {
+      gimple_stmt_iterator i;
+
+      for (i = gsi_start_bb (bb); !gsi_end_p (i); gsi_next (&i))
+	{
+	  unsigned int vf = 1;
+	  enum internal_fn ifn;
+	  gimple stmt = gsi_stmt (i);
+	  tree t;
+	  if (!is_gimple_call (stmt)
+	      || !gimple_call_internal_p (stmt))
+	    continue;
+	  ifn = gimple_call_internal_fn (stmt);
+	  switch (ifn)
+	    {
+	    case IFN_GOMP_SIMD_LANE:
+	    case IFN_GOMP_SIMD_VF:
+	    case IFN_GOMP_SIMD_LAST_LANE:
+	      break;
+	    default:
+	      continue;
+	    }
+	  tree arg = gimple_call_arg (stmt, 0);
+	  gcc_assert (arg != NULL_TREE);
+	  gcc_assert (TREE_CODE (arg) == SSA_NAME);
+	  simduid_to_vf *p = NULL, data;
+	  data.simduid = DECL_UID (SSA_NAME_VAR (arg));
+	  if (htab.is_created ())
+	    p = htab.find (&data);
+	  if (p)
+	    vf = p->vf;
+	  switch (ifn)
+	    {
+	    case IFN_GOMP_SIMD_VF:
+	      t = build_int_cst (unsigned_type_node, vf);
+	      break;
+	    case IFN_GOMP_SIMD_LANE:
+	      t = build_int_cst (unsigned_type_node, 0);
+	      break;
+	    case IFN_GOMP_SIMD_LAST_LANE:
+	      t = gimple_call_arg (stmt, 1);
+	      break;
+	    default:
+	      gcc_unreachable ();
+	    }
+	  update_call_from_tree (&i, t);
+	}
+    }
+}
 
+/* Helper structure for note_simd_array_uses.  */
+
+struct note_simd_array_uses_struct
+{
+  hash_table <decl_to_simduid> *htab;
+  unsigned int simduid;
+};
+
+/* Callback for note_simd_array_uses, called through walk_gimple_op.  */
+
+static tree
+note_simd_array_uses_cb (tree *tp, int *walk_subtrees, void *data)
+{
+  struct walk_stmt_info *wi = (struct walk_stmt_info *) data;
+  struct note_simd_array_uses_struct *ns
+    = (struct note_simd_array_uses_struct *) wi->info;
+
+  if (TYPE_P (*tp))
+    *walk_subtrees = 0;
+  else if (VAR_P (*tp)
+	   && lookup_attribute ("omp simd array", DECL_ATTRIBUTES (*tp))
+	   && DECL_CONTEXT (*tp) == current_function_decl)
+    {
+      decl_to_simduid data;
+      if (!ns->htab->is_created ())
+	ns->htab->create (15);
+      data.decl = *tp;
+      data.simduid = ns->simduid;
+      decl_to_simduid **slot = ns->htab->find_slot (&data, INSERT);
+      if (*slot == NULL)
+	{
+	  decl_to_simduid *p = XNEW (decl_to_simduid);
+	  *p = data;
+	  *slot = p;
+	}
+      else if ((*slot)->simduid != ns->simduid)
+	(*slot)->simduid = -1U;
+      *walk_subtrees = 0;
+    }
+  return NULL_TREE;
+}
+
+/* Find "omp simd array" temporaries and map them to corresponding
+   simduid.  */
+
+static void
+note_simd_array_uses (hash_table <decl_to_simduid> *htab)
+{
+  basic_block bb;
+  gimple_stmt_iterator gsi;
+  struct walk_stmt_info wi;
+  struct note_simd_array_uses_struct ns;
+
+  memset (&wi, 0, sizeof (wi));
+  wi.info = &ns;
+  ns.htab = htab;
+
+  FOR_EACH_BB (bb)
+    for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+      {
+	gimple stmt = gsi_stmt (gsi);
+	if (!is_gimple_call (stmt) || !gimple_call_internal_p (stmt))
+	  continue;
+	switch (gimple_call_internal_fn (stmt))
+	  {
+	  case IFN_GOMP_SIMD_LANE:
+	  case IFN_GOMP_SIMD_VF:
+	  case IFN_GOMP_SIMD_LAST_LANE:
+	    break;
+	  default:
+	    continue;
+	  }
+	tree lhs = gimple_call_lhs (stmt);
+	if (lhs == NULL_TREE)
+	  continue;
+	imm_use_iterator use_iter;
+	gimple use_stmt;
+	ns.simduid = DECL_UID (SSA_NAME_VAR (gimple_call_arg (stmt, 0)));
+	FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, lhs)
+	  if (!is_gimple_debug (use_stmt))
+	    walk_gimple_op (use_stmt, note_simd_array_uses_cb, &wi);
+      }
+}
 
 /* Function vectorize_loops.
 
@@ -86,12 +282,21 @@  vectorize_loops (void)
   unsigned int vect_loops_num;
   loop_iterator li;
   struct loop *loop;
+  hash_table <simduid_to_vf> simduid_to_vf_htab;
+  hash_table <decl_to_simduid> decl_to_simduid_htab;
 
   vect_loops_num = number_of_loops (cfun);
 
   /* Bail out if there are no loops.  */
   if (vect_loops_num <= 1)
-    return 0;
+    {
+      if (cfun->has_simduid_loops)
+	adjust_simduid_builtins (simduid_to_vf_htab);
+      return 0;
+    }
+
+  if (cfun->has_simduid_loops)
+    note_simd_array_uses (&decl_to_simduid_htab);
 
   init_stmt_vec_info_vec ();
 
@@ -126,6 +331,17 @@  vectorize_loops (void)
 	/* Now that the loop has been vectorized, allow it to be unrolled
 	   etc.  */
 	loop->force_vect = false;
+
+	if (loop->simduid)
+	  {
+	    simduid_to_vf *simduid_to_vf_data = XNEW (simduid_to_vf);
+	    if (!simduid_to_vf_htab.is_created ())
+	      simduid_to_vf_htab.create (15);
+	    simduid_to_vf_data->simduid = DECL_UID (loop->simduid);
+	    simduid_to_vf_data->vf = loop_vinfo->vectorization_factor;
+	    *simduid_to_vf_htab.find_slot (simduid_to_vf_data, INSERT)
+	      = simduid_to_vf_data;
+	  }
       }
 
   vect_location = UNKNOWN_LOC;
@@ -153,6 +369,40 @@  vectorize_loops (void)
 
   free_stmt_vec_info_vec ();
 
+  /* Fold IFN_GOMP_SIMD_{VF,LANE,LAST_LANE} builtins.  */
+  if (cfun->has_simduid_loops)
+    adjust_simduid_builtins (simduid_to_vf_htab);
+
+  /* Shrink any "omp array simd" temporary arrays to the
+     actual vectorization factors.  */
+  if (decl_to_simduid_htab.is_created ())
+    {
+      for (hash_table <decl_to_simduid>::iterator iter
+	   = decl_to_simduid_htab.begin ();
+	   iter != decl_to_simduid_htab.end (); ++iter)
+	if ((*iter).simduid != -1U)
+	  {
+	    tree decl = (*iter).decl;
+	    int vf = 1;
+	    if (simduid_to_vf_htab.is_created ())
+	      {
+		simduid_to_vf *p = NULL, data;
+		data.simduid = (*iter).simduid;
+		p = simduid_to_vf_htab.find (&data);
+		if (p)
+		  vf = p->vf;
+	      }
+	    tree atype
+	      = build_array_type_nelts (TREE_TYPE (TREE_TYPE (decl)), vf);
+	    TREE_TYPE (decl) = atype;
+	    relayout_decl (decl);
+	  }
+
+      decl_to_simduid_htab.dispose ();
+    }
+  if (simduid_to_vf_htab.is_created ())
+    simduid_to_vf_htab.dispose ();
+
   if (num_vectorized_loops > 0)
     {
       /* If we vectorized any loop only virtual SSA form needs to be updated.
--- gcc/tree-vectorizer.h.jj	2013-06-26 12:09:40.141530406 +0200
+++ gcc/tree-vectorizer.h	2013-06-26 13:20:56.885806885 +0200
@@ -576,6 +576,9 @@  typedef struct _stmt_vec_info {
   /* For loads only, true if this is a gather load.  */
   bool gather_p;
   bool stride_load_p;
+
+  /* For both loads and stores.  */
+  bool simd_lane_access_p;
 } *stmt_vec_info;
 
 /* Access Functions.  */
@@ -591,6 +594,7 @@  typedef struct _stmt_vec_info {
 #define STMT_VINFO_DATA_REF(S)             (S)->data_ref_info
 #define STMT_VINFO_GATHER_P(S)		   (S)->gather_p
 #define STMT_VINFO_STRIDE_LOAD_P(S)	   (S)->stride_load_p
+#define STMT_VINFO_SIMD_LANE_ACCESS_P(S)   (S)->simd_lane_access_p
 
 #define STMT_VINFO_DR_BASE_ADDRESS(S)      (S)->dr_base_address
 #define STMT_VINFO_DR_INIT(S)              (S)->dr_init
--- gcc/tree-data-ref.c.jj	2013-06-26 12:09:40.106530982 +0200
+++ gcc/tree-data-ref.c	2013-06-26 13:20:56.887806584 +0200
@@ -4331,10 +4331,25 @@  get_references_in_stmt (gimple stmt, vec
   /* ASM_EXPR and CALL_EXPR may embed arbitrary side effects.
      As we cannot model data-references to not spelled out
      accesses give up if they may occur.  */
-  if ((stmt_code == GIMPLE_CALL
-       && !(gimple_call_flags (stmt) & ECF_CONST))
-      || (stmt_code == GIMPLE_ASM
-	  && (gimple_asm_volatile_p (stmt) || gimple_vuse (stmt))))
+  if (stmt_code == GIMPLE_CALL
+      && !(gimple_call_flags (stmt) & ECF_CONST))
+    {
+      /* Allow IFN_GOMP_SIMD_LANE in their own loops.  */
+      if (gimple_call_internal_p (stmt)
+	  && gimple_call_internal_fn (stmt) == IFN_GOMP_SIMD_LANE)
+	{
+	  struct loop *loop = gimple_bb (stmt)->loop_father;
+	  tree uid = gimple_call_arg (stmt, 0);
+	  gcc_assert (TREE_CODE (uid) == SSA_NAME);
+	  if (loop == NULL
+	      || loop->simduid != SSA_NAME_VAR (uid))
+	    clobbers_memory = true;
+	}
+      else
+	clobbers_memory = true;
+    }
+  else if (stmt_code == GIMPLE_ASM
+	   && (gimple_asm_volatile_p (stmt) || gimple_vuse (stmt)))
     clobbers_memory = true;
 
   if (!gimple_vuse (stmt))
--- gcc/tree-inline.c.jj	2013-06-26 12:15:59.083062913 +0200
+++ gcc/tree-inline.c	2013-06-26 13:20:56.889806290 +0200
@@ -2345,6 +2345,8 @@  copy_cfg_body (copy_body_data * id, gcov
 		  get_loop (src_cfun, 0));
       /* Defer to cfgcleanup to update loop-father fields of basic-blocks.  */
       loops_state_set (LOOPS_NEED_FIXUP);
+      cfun->has_force_vect_loops |= src_cfun->has_force_vect_loops;
+      cfun->has_simduid_loops |= src_cfun->has_simduid_loops;
     }
 
   /* If the loop tree in the source function needed fixup, mark the
--- gcc/function.h.jj	2013-06-26 12:09:40.005532641 +0200
+++ gcc/function.h	2013-06-26 13:20:56.891806001 +0200
@@ -654,6 +654,10 @@  struct GTY(()) function {
   /* Nonzero if the current function contains any loops with
      loop->force_vect set.  */
   unsigned int has_force_vect_loops : 1;
+
+  /* Nonzero if the current function contains any loops with
+     nonzero value in loop->simduid.  */
+  unsigned int has_simduid_loops : 1;
 };
 
 /* Add the decl D to the local_decls list of FUN.  */
--- gcc/tree-ssa-ccp.c.jj	2013-06-26 12:09:39.952533516 +0200
+++ gcc/tree-ssa-ccp.c	2013-06-26 13:20:56.891806001 +0200
@@ -626,6 +626,22 @@  likely_value (gimple stmt)
   if (has_constant_operand)
     all_undefined_operands = false;
 
+  if (has_undefined_operand
+      && code == GIMPLE_CALL
+      && gimple_call_internal_p (stmt))
+    switch (gimple_call_internal_fn (stmt))
+      {
+	/* These 3 builtins use the first argument just as a magic
+	   way how to find out a decl uid.  */
+      case IFN_GOMP_SIMD_LANE:
+      case IFN_GOMP_SIMD_VF:
+      case IFN_GOMP_SIMD_LAST_LANE:
+	has_undefined_operand = false;
+	break;
+      default:
+	break;
+      }
+
   /* If the operation combines operands like COMPLEX_EXPR make sure to
      not mark the result UNDEFINED if only one part of the result is
      undefined.  */
--- gcc/tree-pretty-print.c.jj	2013-06-26 12:09:41.457508743 +0200
+++ gcc/tree-pretty-print.c	2013-06-26 13:20:56.893805656 +0200
@@ -595,6 +595,13 @@  dump_omp_clause (pretty_printer *buffer,
       pp_character (buffer, ')');
       break;
 
+    case OMP_CLAUSE__SIMDUID_:
+      pp_string (buffer, "_simduid_(");
+      dump_generic_node (buffer, OMP_CLAUSE__SIMDUID__DECL (clause),
+			 spc, flags, false);
+      pp_character (buffer, ')');
+      break;
+
     case OMP_CLAUSE_INBRANCH:
       pp_string (buffer, "inbranch");
       break;
--- gcc/cfgloop.h.jj	2013-06-26 12:09:40.235528856 +0200
+++ gcc/cfgloop.h	2013-06-26 13:20:56.893805656 +0200
@@ -177,6 +177,11 @@  struct GTY ((chain_next ("%h.next"))) lo
   /* True if we should try harder to vectorize this loop.  */
   bool force_vect;
 
+  /* For SIMD loops, this is a unique identifier of the loop, referenced
+     by IFN_GOMP_SIMD_VF, IFN_GOMP_SIMD_LANE and IFN_GOMP_SIMD_LAST_LANE
+     builtins.  */
+  tree simduid;
+
   /* Upper bound on number of iterations of a loop.  */
   struct nb_iter_bound *bounds;
 
--- gcc/Makefile.in.jj	2013-06-26 12:16:01.439023086 +0200
+++ gcc/Makefile.in	2013-06-26 13:20:56.895805318 +0200
@@ -2637,7 +2637,7 @@  tree-vect-data-refs.o: tree-vect-data-re
 tree-vectorizer.o: tree-vectorizer.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
    $(DUMPFILE_H) $(TM_H) $(GGC_H) $(TREE_H) $(TREE_FLOW_H) \
    $(CFGLOOP_H) $(TREE_PASS_H) $(TREE_VECTORIZER_H) \
-   $(TREE_PRETTY_PRINT_H)
+   $(TREE_PRETTY_PRINT_H) $(HASH_TABLE_H) tree-ssa-propagate.h
 tree-loop-distribution.o: tree-loop-distribution.c $(CONFIG_H) $(SYSTEM_H) \
    coretypes.h $(TREE_FLOW_H) $(CFGLOOP_H) $(TREE_DATA_REF_H) $(TREE_PASS_H)
 tree-parloops.o: tree-parloops.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
--- gcc/tree-vect-data-refs.c.jj	2013-06-26 12:15:48.416244769 +0200
+++ gcc/tree-vect-data-refs.c	2013-06-26 13:20:56.898804836 +0200
@@ -2877,6 +2877,7 @@  vect_analyze_data_refs (loop_vec_info lo
       stmt_vec_info stmt_info;
       tree base, offset, init;
       bool gather = false;
+      bool simd_lane_access = false;
       int vf;
 
 again:
@@ -2908,12 +2909,17 @@  again:
       if (!DR_BASE_ADDRESS (dr) || !DR_OFFSET (dr) || !DR_INIT (dr)
 	  || !DR_STEP (dr))
         {
-	  /* If target supports vector gather loads, see if they can't
-	     be used.  */
-	  if (loop_vinfo
-	      && DR_IS_READ (dr)
+	  bool maybe_gather
+	    = DR_IS_READ (dr)
 	      && !TREE_THIS_VOLATILE (DR_REF (dr))
-	      && targetm.vectorize.builtin_gather != NULL
+	      && targetm.vectorize.builtin_gather != NULL;
+	  bool maybe_simd_lane_access
+	    = loop_vinfo && loop->simduid;
+
+	  /* If target supports vector gather loads, or if this might be
+	     a SIMD lane access, see if they can't be used.  */
+	  if (loop_vinfo
+	      && (maybe_gather || maybe_simd_lane_access)
 	      && !nested_in_vect_loop_p (loop, stmt))
 	    {
 	      struct data_reference *newdr
@@ -2926,14 +2932,59 @@  again:
 		  && DR_STEP (newdr)
 		  && integer_zerop (DR_STEP (newdr)))
 		{
-		  dr = newdr;
-		  gather = true;
+		  if (maybe_simd_lane_access)
+		    {
+		      tree off = DR_OFFSET (newdr);
+		      STRIP_NOPS (off);
+		      if (TREE_CODE (DR_INIT (newdr)) == INTEGER_CST
+			  && TREE_CODE (off) == MULT_EXPR
+			  && host_integerp (TREE_OPERAND (off, 1), 1))
+			{
+			  tree step = TREE_OPERAND (off, 1);
+			  off = TREE_OPERAND (off, 0);
+			  STRIP_NOPS (off);
+			  if (CONVERT_EXPR_P (off)
+			      && TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (off,
+									  0)))
+				 < TYPE_PRECISION (TREE_TYPE (off)))
+			    off = TREE_OPERAND (off, 0);
+			  if (TREE_CODE (off) == SSA_NAME)
+			    {
+			      gimple def = SSA_NAME_DEF_STMT (off);
+			      tree reft = TREE_TYPE (DR_REF (newdr));
+			      if (gimple_call_internal_p (def)
+				  && gimple_call_internal_fn (def)
+				  == IFN_GOMP_SIMD_LANE)
+				{
+				  tree arg = gimple_call_arg (def, 0);
+				  gcc_assert (TREE_CODE (arg) == SSA_NAME);
+				  arg = SSA_NAME_VAR (arg);
+				  if (arg == loop->simduid
+				      /* For now.  */
+				      && tree_int_cst_equal
+					   (TYPE_SIZE_UNIT (reft),
+					    step))
+				    {
+				      DR_OFFSET (newdr) = ssize_int (0);
+				      DR_STEP (newdr) = step;
+				      dr = newdr;
+				      simd_lane_access = true;
+				    }
+				}
+			    }
+			}
+		    }
+		  if (!simd_lane_access && maybe_gather)
+		    {
+		      dr = newdr;
+		      gather = true;
+		    }
 		}
-	      else
+	      if (!gather && !simd_lane_access)
 		free_data_ref (newdr);
 	    }
 
-	  if (!gather)
+	  if (!gather && !simd_lane_access)
 	    {
 	      if (dump_enabled_p ())
 		{
@@ -2960,7 +3011,7 @@  again:
           if (bb_vinfo)
 	    break;
 
-	  if (gather)
+	  if (gather || simd_lane_access)
 	    free_data_ref (dr);
 	  return false;
         }
@@ -2993,7 +3044,7 @@  again:
           if (bb_vinfo)
 	    break;
 
-	  if (gather)
+	  if (gather || simd_lane_access)
 	    free_data_ref (dr);
           return false;
         }
@@ -3012,7 +3063,7 @@  again:
           if (bb_vinfo)
 	    break;
 
-	  if (gather)
+	  if (gather || simd_lane_access)
 	    free_data_ref (dr);
           return false;
 	}
@@ -3033,7 +3084,7 @@  again:
 	  if (bb_vinfo)
 	    break;
 
-	  if (gather)
+	  if (gather || simd_lane_access)
 	    free_data_ref (dr);
 	  return false;
 	}
@@ -3168,12 +3219,17 @@  again:
           if (bb_vinfo)
 	    break;
 
-	  if (gather)
+	  if (gather || simd_lane_access)
 	    free_data_ref (dr);
           return false;
         }
 
       STMT_VINFO_DATA_REF (stmt_info) = dr;
+      if (simd_lane_access)
+	{
+	  STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) = true;
+	  datarefs[i] = dr;
+	}
 
       /* Set vectype for STMT.  */
       scalar_type = TREE_TYPE (DR_REF (dr));
@@ -3194,7 +3250,7 @@  again:
           if (bb_vinfo)
 	    break;
 
-	  if (gather)
+	  if (gather || simd_lane_access)
 	    {
 	      STMT_VINFO_DATA_REF (stmt_info) = NULL;
 	      free_data_ref (dr);
--- gcc/gimplify.c.jj	2013-06-26 12:09:40.028532266 +0200
+++ gcc/gimplify.c	2013-06-26 13:20:56.901804384 +0200
@@ -5814,7 +5814,7 @@  omp_add_variable (struct gimplify_omp_ct
     flags |= GOVD_SEEN;
 
   n = splay_tree_lookup (ctx->variables, (splay_tree_key)decl);
-  if (n != NULL)
+  if (n != NULL && n->value != GOVD_ALIGNED)
     {
       /* We shouldn't be re-adding the decl with the same data
 	 sharing class.  */
@@ -5823,7 +5823,8 @@  omp_add_variable (struct gimplify_omp_ct
 	 FIRSTPRIVATE and LASTPRIVATE.  */
       nflags = n->value | flags;
       gcc_assert ((nflags & GOVD_DATA_SHARE_CLASS)
-		  == (GOVD_FIRSTPRIVATE | GOVD_LASTPRIVATE));
+		  == (GOVD_FIRSTPRIVATE | GOVD_LASTPRIVATE)
+		  || (flags & GOVD_DATA_SHARE_CLASS) == 0);
       n->value = nflags;
       return;
     }
@@ -5893,7 +5894,10 @@  omp_add_variable (struct gimplify_omp_ct
 	}
     }
 
-  splay_tree_insert (ctx->variables, (splay_tree_key)decl, flags);
+  if (n != NULL)
+    n->value |= flags;
+  else
+    splay_tree_insert (ctx->variables, (splay_tree_key)decl, flags);
 }
 
 /* Notice a threadprivate variable DECL used in OpenMP context CTX.
@@ -6935,7 +6939,7 @@  gimplify_omp_for (tree *expr_p, gimple_s
 	  omp_is_private (gimplify_omp_ctxp, decl, simd);
 	  if (n != NULL && (n->value & GOVD_DATA_SHARE_CLASS) != 0)
 	    omp_notice_variable (gimplify_omp_ctxp, decl, true);
-	  else
+	  else if (TREE_VEC_LENGTH (OMP_FOR_INIT (for_stmt)) == 1)
 	    {
 	      c = build_omp_clause (input_location, OMP_CLAUSE_LINEAR);
 	      OMP_CLAUSE_LINEAR_NO_COPYIN (c) = 1;
@@ -6948,6 +6952,21 @@  gimplify_omp_for (tree *expr_p, gimple_s
 	      omp_add_variable (gimplify_omp_ctxp, decl,
 				GOVD_LINEAR | GOVD_EXPLICIT | GOVD_SEEN);
 	    }
+	  else
+	    {
+	      bool lastprivate
+		= (!has_decl_expr
+		   || !bitmap_bit_p (has_decl_expr, DECL_UID (decl)));
+	      c = build_omp_clause (input_location,
+				    lastprivate ? OMP_CLAUSE_LASTPRIVATE
+						: OMP_CLAUSE_PRIVATE);
+	      OMP_CLAUSE_DECL (c) = decl;
+	      OMP_CLAUSE_CHAIN (c) = OMP_FOR_CLAUSES (for_stmt);
+	      omp_add_variable (gimplify_omp_ctxp, decl,
+				(lastprivate ? GOVD_LASTPRIVATE : GOVD_PRIVATE)
+				| GOVD_SEEN);
+	      c = NULL_TREE;
+	    }
 	}
       else if (omp_is_private (gimplify_omp_ctxp, decl, simd))
 	omp_notice_variable (gimplify_omp_ctxp, decl, true);
--- gcc/tree.c.jj	2013-06-26 12:13:53.989297657 +0200
+++ gcc/tree.c	2013-06-26 13:20:56.903804097 +0200
@@ -266,7 +266,8 @@  unsigned const char omp_clause_num_ops[]
   0, /* OMP_CLAUSE_FOR  */
   0, /* OMP_CLAUSE_PARALLEL  */
   0, /* OMP_CLAUSE_SECTIONS  */
-  0  /* OMP_CLAUSE_TASKGROUP  */
+  0, /* OMP_CLAUSE_TASKGROUP  */
+  1, /* OMP_CLAUSE__SIMDUID_  */
 };
 
 const char * const omp_clause_code_name[] =
@@ -309,7 +310,8 @@  const char * const omp_clause_code_name[
   "for",
   "parallel",
   "sections",
-  "taskgroup"
+  "taskgroup",
+  "_simduid_"
 };
 
 
@@ -11133,6 +11135,7 @@  walk_tree_1 (tree *tp, walk_tree_fn func
 	case OMP_CLAUSE_SAFELEN:
 	case OMP_CLAUSE_SIMDLEN:
 	case OMP_CLAUSE__LOOPTEMP_:
+	case OMP_CLAUSE__SIMDUID_:
 	  WALK_SUBTREE (OMP_CLAUSE_OPERAND (*tp, 0));
 	  /* FALLTHRU */
 
--- gcc/tree-vect-loop.c.jj	2013-06-26 12:15:47.697256876 +0200
+++ gcc/tree-vect-loop.c	2013-06-26 13:20:56.905803819 +0200
@@ -5361,7 +5361,7 @@  vectorizable_induction (gimple phi, gimp
 bool
 vectorizable_live_operation (gimple stmt,
 			     gimple_stmt_iterator *gsi ATTRIBUTE_UNUSED,
-			     gimple *vec_stmt ATTRIBUTE_UNUSED)
+			     gimple *vec_stmt)
 {
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
@@ -5381,7 +5381,41 @@  vectorizable_live_operation (gimple stmt
     return false;
 
   if (!is_gimple_assign (stmt))
-    return false;
+    {
+      if (gimple_call_internal_p (stmt)
+	  && gimple_call_internal_fn (stmt) == IFN_GOMP_SIMD_LANE
+	  && gimple_call_lhs (stmt)
+	  && loop->simduid
+	  && TREE_CODE (gimple_call_arg (stmt, 0)) == SSA_NAME
+	  && loop->simduid
+	     == SSA_NAME_VAR (gimple_call_arg (stmt, 0)))
+	{
+	  edge e = single_exit (loop);
+	  basic_block merge_bb = e->dest;
+	  imm_use_iterator imm_iter;
+	  use_operand_p use_p;
+	  tree lhs = gimple_call_lhs (stmt);
+
+	  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, lhs)
+	    {
+	      gimple use_stmt = USE_STMT (use_p);
+	      if (gimple_code (use_stmt) == GIMPLE_PHI
+		  || gimple_bb (use_stmt) == merge_bb)
+		{
+		  if (vec_stmt)
+		    {
+		      tree vfm1
+			= build_int_cst (unsigned_type_node,
+					 loop_vinfo->vectorization_factor - 1);
+		      SET_PHI_ARG_DEF (use_stmt, e->dest_idx, vfm1);
+		    }
+		  return true;
+		}
+	    }
+	}
+
+      return false;
+    }
 
   if (TREE_CODE (gimple_assign_lhs (stmt)) != SSA_NAME)
     return false;
--- gcc/tree-vect-stmts.c.jj	2013-06-27 10:14:00.641784465 +0200
+++ gcc/tree-vect-stmts.c	2013-06-27 20:13:22.386855317 +0200
@@ -1755,6 +1755,14 @@  vectorizable_call (gimple stmt, gimple_s
   if (nargs == 0 || nargs > 3)
     return false;
 
+  /* Ignore the argument of IFN_GOMP_SIMD_LANE, it is magic.  */
+  if (gimple_call_internal_p (stmt)
+      && gimple_call_internal_fn (stmt) == IFN_GOMP_SIMD_LANE)
+    {
+      nargs = 0;
+      rhs_type = unsigned_type_node;
+    }
+
   for (i = 0; i < nargs; i++)
     {
       tree opvectype;
@@ -1830,11 +1838,26 @@  vectorizable_call (gimple stmt, gimple_s
   fndecl = vectorizable_function (stmt, vectype_out, vectype_in);
   if (fndecl == NULL_TREE)
     {
-      if (dump_enabled_p ())
-	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-                         "function is not vectorizable.");
-
-      return false;
+      if (gimple_call_internal_p (stmt)
+	  && gimple_call_internal_fn (stmt) == IFN_GOMP_SIMD_LANE
+	  && !slp_node
+	  && loop_vinfo
+	  && LOOP_VINFO_LOOP (loop_vinfo)->simduid
+	  && TREE_CODE (gimple_call_arg (stmt, 0)) == SSA_NAME
+	  && LOOP_VINFO_LOOP (loop_vinfo)->simduid
+	     == SSA_NAME_VAR (gimple_call_arg (stmt, 0)))
+	{
+	  /* We can handle IFN_GOMP_SIMD_LANE by returning a
+	     { 0, 1, 2, ... vf - 1 } vector.  */
+	  gcc_assert (nargs == 0);
+	}
+      else
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			     "function is not vectorizable.");
+	  return false;
+	}
     }
 
   gcc_assert (!gimple_vuse (stmt));
@@ -1932,9 +1955,30 @@  vectorizable_call (gimple stmt, gimple_s
 	      vargs.quick_push (vec_oprnd0);
 	    }
 
-	  new_stmt = gimple_build_call_vec (fndecl, vargs);
-	  new_temp = make_ssa_name (vec_dest, new_stmt);
-	  gimple_call_set_lhs (new_stmt, new_temp);
+	  if (gimple_call_internal_p (stmt)
+	      && gimple_call_internal_fn (stmt) == IFN_GOMP_SIMD_LANE)
+	    {
+	      tree *v = XALLOCAVEC (tree, nunits_out);
+	      int k;
+	      for (k = 0; k < nunits_out; ++k)
+		v[k] = build_int_cst (unsigned_type_node, j * nunits_out + k);
+	      tree cst = build_vector (vectype_out, v);
+	      tree new_var
+		= vect_get_new_vect_var (vectype_out, vect_simple_var, "cst_");
+	      gimple init_stmt = gimple_build_assign (new_var, cst);
+	      new_temp = make_ssa_name (new_var, init_stmt);
+	      gimple_assign_set_lhs (init_stmt, new_temp);
+	      vect_init_vector_1 (stmt, init_stmt, NULL);
+	      new_temp = make_ssa_name (vec_dest, NULL);
+	      new_stmt = gimple_build_assign (new_temp,
+					      gimple_assign_lhs (init_stmt));
+	    }
+	  else
+	    {
+	      new_stmt = gimple_build_call_vec (fndecl, vargs);
+	      new_temp = make_ssa_name (vec_dest, new_stmt);
+	      gimple_call_set_lhs (new_stmt, new_temp);
+	    }
 	  vect_finish_stmt_generation (stmt, new_stmt, gsi);
 
 	  if (j == 0)
@@ -3796,6 +3840,7 @@  vectorizable_store (gimple stmt, gimple_
   enum vect_def_type dt;
   stmt_vec_info prev_stmt_info = NULL;
   tree dataref_ptr = NULL_TREE;
+  tree dataref_offset = NULL_TREE;
   gimple ptr_incr = NULL;
   int nunits = TYPE_VECTOR_SUBPARTS (vectype);
   int ncopies;
@@ -4085,9 +4130,26 @@  vectorizable_store (gimple stmt, gimple_
 	  /* We should have catched mismatched types earlier.  */
 	  gcc_assert (useless_type_conversion_p (vectype,
 						 TREE_TYPE (vec_oprnd)));
-	  dataref_ptr = vect_create_data_ref_ptr (first_stmt, aggr_type, NULL,
-						  NULL_TREE, &dummy, gsi,
-						  &ptr_incr, false, &inv_p);
+	  bool simd_lane_access_p
+	    = STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info);
+	  if (simd_lane_access_p
+	      && TREE_CODE (DR_BASE_ADDRESS (first_dr)) == ADDR_EXPR
+	      && VAR_P (TREE_OPERAND (DR_BASE_ADDRESS (first_dr), 0))
+	      && integer_zerop (DR_OFFSET (first_dr))
+	      && integer_zerop (DR_INIT (first_dr))
+	      && alias_sets_conflict_p (get_alias_set (aggr_type),
+					get_alias_set (DR_REF (first_dr))))
+	    {
+	      dataref_ptr = unshare_expr (DR_BASE_ADDRESS (first_dr));
+	      dataref_offset = build_int_cst (reference_alias_ptr_type
+					      (DR_REF (first_dr)), 0);
+	    }
+	  else
+	    dataref_ptr
+	      = vect_create_data_ref_ptr (first_stmt, aggr_type,
+					  simd_lane_access_p ? loop : NULL,
+					  NULL_TREE, &dummy, gsi, &ptr_incr,
+					  simd_lane_access_p, &inv_p);
 	  gcc_assert (bb_vinfo || !inv_p);
 	}
       else
@@ -4108,8 +4170,13 @@  vectorizable_store (gimple stmt, gimple_
 	      dr_chain[i] = vec_oprnd;
 	      oprnds[i] = vec_oprnd;
 	    }
-	  dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi, stmt,
-					 TYPE_SIZE_UNIT (aggr_type));
+	  if (dataref_offset)
+	    dataref_offset
+	      = int_const_binop (PLUS_EXPR, dataref_offset,
+				 TYPE_SIZE_UNIT (aggr_type));
+	  else
+	    dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi, stmt,
+					   TYPE_SIZE_UNIT (aggr_type));
 	}
 
       if (store_lanes_p)
@@ -4161,8 +4228,10 @@  vectorizable_store (gimple stmt, gimple_
 		vec_oprnd = result_chain[i];
 
 	      data_ref = build2 (MEM_REF, TREE_TYPE (vec_oprnd), dataref_ptr,
-				 build_int_cst (reference_alias_ptr_type
-						(DR_REF (first_dr)), 0));
+				 dataref_offset
+				 ? dataref_offset
+				 : build_int_cst (reference_alias_ptr_type
+						  (DR_REF (first_dr)), 0));
 	      align = TYPE_ALIGN_UNIT (vectype);
 	      if (aligned_access_p (first_dr))
 		misalign = 0;
@@ -4181,8 +4250,9 @@  vectorizable_store (gimple stmt, gimple_
 					  TYPE_ALIGN (elem_type));
 		  misalign = DR_MISALIGNMENT (first_dr);
 		}
-	      set_ptr_info_alignment (get_ptr_info (dataref_ptr), align,
-				      misalign);
+	      if (dataref_offset == NULL_TREE)
+		set_ptr_info_alignment (get_ptr_info (dataref_ptr), align,
+					misalign);
 
 	      /* Arguments are ready.  Create the new vector stmt.  */
 	      new_stmt = gimple_build_assign (data_ref, vec_oprnd);
@@ -4314,6 +4384,7 @@  vectorizable_load (gimple stmt, gimple_s
   tree dummy;
   enum dr_alignment_support alignment_support_scheme;
   tree dataref_ptr = NULL_TREE;
+  tree dataref_offset = NULL_TREE;
   gimple ptr_incr = NULL;
   int nunits = TYPE_VECTOR_SUBPARTS (vectype);
   int ncopies;
@@ -4947,9 +5018,32 @@  vectorizable_load (gimple stmt, gimple_s
     {
       /* 1. Create the vector or array pointer update chain.  */
       if (j == 0)
-        dataref_ptr = vect_create_data_ref_ptr (first_stmt, aggr_type, at_loop,
-						offset, &dummy, gsi,
-						&ptr_incr, false, &inv_p);
+	{
+	  bool simd_lane_access_p
+	    = STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info);
+	  if (simd_lane_access_p
+	      && TREE_CODE (DR_BASE_ADDRESS (first_dr)) == ADDR_EXPR
+	      && VAR_P (TREE_OPERAND (DR_BASE_ADDRESS (first_dr), 0))
+	      && integer_zerop (DR_OFFSET (first_dr))
+	      && integer_zerop (DR_INIT (first_dr))
+	      && alias_sets_conflict_p (get_alias_set (aggr_type),
+					get_alias_set (DR_REF (first_dr)))
+	      && (alignment_support_scheme == dr_aligned
+		  || alignment_support_scheme == dr_unaligned_supported))
+	    {
+	      dataref_ptr = unshare_expr (DR_BASE_ADDRESS (first_dr));
+	      dataref_offset = build_int_cst (reference_alias_ptr_type
+					      (DR_REF (first_dr)), 0);
+	    }
+	  else
+	    dataref_ptr
+	      = vect_create_data_ref_ptr (first_stmt, aggr_type, at_loop,
+					  offset, &dummy, gsi, &ptr_incr,
+					  simd_lane_access_p, &inv_p);
+	}
+      else if (dataref_offset)
+	dataref_offset = int_const_binop (PLUS_EXPR, dataref_offset,
+					  TYPE_SIZE_UNIT (aggr_type));
       else
         dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi, stmt,
 				       TYPE_SIZE_UNIT (aggr_type));
@@ -4999,8 +5093,10 @@  vectorizable_load (gimple stmt, gimple_s
 
 		    data_ref
 		      = build2 (MEM_REF, vectype, dataref_ptr,
-				build_int_cst (reference_alias_ptr_type
-					       (DR_REF (first_dr)), 0));
+				dataref_offset
+				? dataref_offset
+				: build_int_cst (reference_alias_ptr_type
+						 (DR_REF (first_dr)), 0));
 		    align = TYPE_ALIGN_UNIT (vectype);
 		    if (alignment_support_scheme == dr_aligned)
 		      {
@@ -5022,8 +5118,9 @@  vectorizable_load (gimple stmt, gimple_s
 						TYPE_ALIGN (elem_type));
 			misalign = DR_MISALIGNMENT (first_dr);
 		      }
-		    set_ptr_info_alignment (get_ptr_info (dataref_ptr),
-					    align, misalign);
+		    if (dataref_offset == NULL_TREE)
+		      set_ptr_info_alignment (get_ptr_info (dataref_ptr),
+					      align, misalign);
 		    break;
 		  }
 		case dr_explicit_realign:
--- gcc/internal-fn.c.jj	2013-06-26 12:09:41.420509346 +0200
+++ gcc/internal-fn.c	2013-06-26 13:20:56.908803415 +0200
@@ -109,6 +109,30 @@  expand_STORE_LANES (gimple stmt)
   expand_insn (get_multi_vector_move (type, vec_store_lanes_optab), 2, ops);
 }
 
+/* This should get expanded in adjust_simduid_builtins.  */
+
+static void
+expand_GOMP_SIMD_LANE (gimple stmt ATTRIBUTE_UNUSED)
+{
+  gcc_unreachable ();
+}
+
+/* This should get expanded in adjust_simduid_builtins.  */
+
+static void
+expand_GOMP_SIMD_VF (gimple stmt ATTRIBUTE_UNUSED)
+{
+  gcc_unreachable ();
+}
+
+/* This should get expanded in adjust_simduid_builtins.  */
+
+static void
+expand_GOMP_SIMD_LAST_LANE (gimple stmt ATTRIBUTE_UNUSED)
+{
+  gcc_unreachable ();
+}
+
 /* Routines to expand each internal function, indexed by function number.
    Each routine has the prototype:
 
--- libgomp/testsuite/libgomp.c++/simd-2.C.jj	2013-06-26 15:27:51.018093050 +0200
+++ libgomp/testsuite/libgomp.c++/simd-2.C	2013-06-26 18:38:18.383251485 +0200
@@ -0,0 +1,36 @@ 
+// { dg-do run }
+// { dg-options "-O2" }
+// { dg-additional-options "-msse2" { target sse2_runtime } }
+// { dg-additional-options "-mavx" { target avx_runtime } }
+
+extern "C" void abort ();
+__UINTPTR_TYPE__ arr[1027];
+
+__attribute__((noinline, noclone)) void
+foo ()
+{
+  int i, v;
+  #pragma omp simd private (v) safelen(16)
+  for (i = 0; i < 1027; i++)
+    arr[i] = (__UINTPTR_TYPE__) &v;
+}
+
+int
+main ()
+{
+  int i, j, cnt = 0;
+  __UINTPTR_TYPE__ arr2[16];
+  foo ();
+  for (i = 0; i < 1027; i++)
+    {
+      for (j = 0; j < cnt; j++)
+	if (arr[i] == arr2[j])
+	  break;
+      if (j != cnt)
+	continue;
+      if (cnt == 16)
+	abort ();
+      arr2[cnt++] = arr[i];
+    }
+  return 0;
+}
--- libgomp/testsuite/libgomp.c++/simd-1.C.jj	2013-06-26 13:20:56.908803415 +0200
+++ libgomp/testsuite/libgomp.c++/simd-1.C	2013-06-26 18:55:32.664371020 +0200
@@ -0,0 +1,79 @@ 
+// { dg-do run }
+// { dg-options "-O2" }
+// { dg-additional-options "-msse2" { target sse2_runtime } }
+// { dg-additional-options "-mavx" { target avx_runtime } }
+
+extern "C" void abort ();
+int a[1024] __attribute__((aligned (32))) = { 1 };
+int b[1024] __attribute__((aligned (32))) = { 1 };
+int k, m;
+struct U { U (); ~U (); int u; };
+struct V
+{
+  V () : v (8) {}
+  ~V ()
+  {
+    if (v > 38 + 4 + 3 * 1024 + 1)
+      abort ();
+  }
+  V &operator= (const V &x) { v = x.v + 1; return *this; }
+  int v;
+};
+
+__attribute__((noinline, noclone))
+U::U () : u (6)
+{
+}
+
+__attribute__((noinline, noclone))
+U::~U ()
+{
+  if (u > 38 + 4 + 3 * 1023)
+    abort ();
+}
+
+__attribute__((noinline, noclone)) int
+foo (int *p)
+{
+  int i, s = 0;
+  U u;
+  V v;
+  #pragma omp simd aligned(a, p : 32) linear(k: m + 1) \
+		   reduction(+:s) lastprivate(u, v)
+  for (i = 0; i < 1024; i++)
+    {
+      a[i] *= p[i];
+      u.u = p[i] + k;
+      k += m + 1;
+      v.v = p[i] + k;
+      s += p[i] + k;
+    }
+  if (u.u != 36 + 4 + 3 * 1023 || v.v != 36 + 4 + 3 * 1024 + 1)
+    abort ();
+  return s;
+}
+
+int
+main ()
+{
+#if __SIZEOF_INT__ >= 4
+  int i;
+  k = 4;
+  m = 2;
+  for (i = 0; i < 1024; i++)
+    {
+      a[i] = i - 512;
+      b[i] = (i - 51) % 39;
+    }
+  int s = foo (b);
+  for (i = 0; i < 1024; i++)
+    {
+      if (b[i] != (i - 51) % 39
+	  || a[i] != (i - 512) * b[i])
+	abort ();
+    }
+  if (k != 4 + 3 * 1024 || s != 1596127)
+    abort ();
+#endif
+  return 0;
+}
--- libgomp/testsuite/libgomp.c++/simd-3.C.jj	2013-06-26 20:37:55.709194915 +0200
+++ libgomp/testsuite/libgomp.c++/simd-3.C	2013-06-26 18:38:26.000000000 +0200
@@ -0,0 +1,131 @@ 
+// { dg-do run }
+// { dg-options "-O2" }
+// { dg-additional-options "-msse2" { target sse2_runtime } }
+// { dg-additional-options "-mavx" { target avx_runtime } }
+
+extern "C" void abort ();
+int a[1024] __attribute__((aligned (32))) = { 1 };
+int b[1024] __attribute__((aligned (32))) = { 1 };
+unsigned char c[1024] __attribute__((aligned (32))) = { 1 };
+int k, m;
+__UINTPTR_TYPE__ u, u2, u3;
+
+__attribute__((noinline, noclone)) int
+foo (int *p)
+{
+  int i, s = 0, s2 = 0, t, t2;
+  #pragma omp simd aligned(a, b, p : 32) linear(k: m + 1) reduction(+:s) \
+		   lastprivate (t2)
+  for (i = 0; i < 512; i++)
+    {
+      a[i] *= p[i];
+      t2 = k + p[i];
+      k += m + 1;
+      s += p[i] + k;
+      c[i]++;
+    }
+  #pragma omp simd aligned(a, b, p : 32) linear(k: m + 1) reduction(+:s2) \
+		   lastprivate (t, u, u2, u3)
+  for (i = 512; i < 1024; i++)
+    {
+      a[i] *= p[i];
+      k += m + 1;
+      t = k + p[i];
+      u = (__UINTPTR_TYPE__) &k;
+      u2 = (__UINTPTR_TYPE__) &s2;
+      u3 = (__UINTPTR_TYPE__) &t;
+      s2 += t;
+      c[i]++;
+    }
+  return s + s2 + t + t2;
+}
+
+__attribute__((noinline, noclone)) long int
+bar (int *p, long int n, long int o)
+{
+  long int i, s = 0, s2 = 0, t, t2;
+  #pragma omp simd aligned(a, b, p : 32) linear(k: m + 1) reduction(+:s) \
+		   lastprivate (t2)
+  for (i = 0; i < n; i++)
+    {
+      a[i] *= p[i];
+      t2 = k + p[i];
+      k += m + 1;
+      s += p[i] + k;
+      c[i]++;
+    }
+  #pragma omp simd aligned(a, b, p : 32) linear(k: m + 1) reduction(+:s2) \
+		   lastprivate (t, u, u2, u3)
+  for (i = n; i < o; i++)
+    {
+      a[i] *= p[i];
+      k += m + 1;
+      t = k + p[i];
+      u = (__UINTPTR_TYPE__) &k;
+      u2 = (__UINTPTR_TYPE__) &s2;
+      u3 = (__UINTPTR_TYPE__) &t;
+      s2 += t;
+      c[i]++;
+    }
+  return s + s2 + t + t2;
+}
+
+int
+main ()
+{
+#if __SIZEOF_INT__ >= 4
+  int i;
+  k = 4;
+  m = 2;
+  for (i = 0; i < 1024; i++)
+    {
+      a[i] = i - 512;
+      b[i] = (i - 51) % 39;
+      c[i] = (unsigned char) i;
+    }
+  int s = foo (b);
+  for (i = 0; i < 1024; i++)
+    {
+      if (b[i] != (i - 51) % 39
+	  || a[i] != (i - 512) * b[i]
+	  || c[i] != (unsigned char) (i + 1))
+	abort ();
+      a[i] = i - 512;
+    }
+  if (k != 4 + 3 * 1024
+      || s != 1596127 + (4 + 3 * 511 + b[511]) + (4 + 3 * 1024 + b[1023]))
+    abort ();
+  k = 4;
+  s = bar (b, 512, 1024);
+  for (i = 0; i < 1024; i++)
+    {
+      if (b[i] != (i - 51) % 39
+	  || a[i] != (i - 512) * b[i]
+	  || c[i] != (unsigned char) (i + 2))
+	abort ();
+      a[i] = i - 512;
+    }
+  if (k != 4 + 3 * 1024
+      || s != 1596127 + (4 + 3 * 511 + b[511]) + (4 + 3 * 1024 + b[1023]))
+    abort ();
+  k = 4;
+  s = bar (b, 511, 1021);
+  for (i = 0; i < 1021; i++)
+    {
+      if (b[i] != (i - 51) % 39
+	  || a[i] != (i - 512) * b[i]
+	  || c[i] != (unsigned char) (i + 3))
+	abort ();
+      a[i] = i - 512;
+    }
+  for (i = 1021; i < 1024; i++)
+    if (b[i] != (i - 51) % 39
+	|| a[i] != i - 512
+	|| c[i] != (unsigned char) (i + 2))
+      abort ();
+  if (k != 4 + 3 * 1021
+      || s != 1586803 + (4 + 3 * 510 + b[510]) + (4 + 3 * 1021 + b[1020]))
+    abort ();
+#endif
+  return 0;
+}