diff mbox series

[RFC] #pragma omp scan inclusive vectorization

Message ID 20190614172927.GG19695@tucnak
State New
Headers show
Series [RFC] #pragma omp scan inclusive vectorization | expand

Commit Message

Jakub Jelinek June 14, 2019, 5:29 p.m. UTC
Hi!

OpenMP 5.0 introduced scan reductions, like:
  #pragma omp simd reduction (inscan, +:r)
  for (int i = 0; i < 1024; i++)
    {
      r += a[i];
      #pragma omp scan inclusive(r)
      b[i] = r;
    }
where there are 2 parts of code in each iteration, one which is supposed
to compute the value for the privatized reduction variable (the private
copy is initialized with a neutral element of the operation at the
start of that part), and then the #pragma omp scan is supposed to
change that private variable to include (in this case) inclusive partial
sums.  E.g. PSTL we now have in libstdc++-v3/include/pstl/ makes use of
these when available to implement std::*_scan.  It can be done also in
worksharing loops, but I'll get to that later.

Anyway, the problem is that e.g. with OpenMP user defined reductions,
the initializer and combiner of the reduction aren't simple operations
during OpenMP lowering, it can be a method call or constructor call etc.,
so we need something that preserves those initializer and combiner snippets
in the IL for the vectorizer to be able to optimize them if they are
simplified enough, on the other side it needs to be something that the
normal optimizers are able to optimize and that actually works even when
the vectorization isn't performed.

The following (incomplete, but far enough that for non-user defined
reductions it handles the inclusive scan) patch handles that by using
more magic, it adds variants to the .GOMP_SIMD_LANE builtin and uses those,
the old one (0) in the user code, another variant (1) in the initializer
and another variant (2) in the combiner pattern, which the vectorizer then
needs to pattern recognize and either vectorize, or punt on vectorization.
If it vectorizes it, it emits code like (optimized dump):
  <bb 5> [local count: 708669599]:
  # ivtmp.27_45 = PHI <0(4), ivtmp.27_12(5)>
  # D__lsm.39_80 = PHI <D__lsm.39_47(4), _64(5)>
  vect__4.15_49 = MEM[base: a_23(D), index: ivtmp.27_45, offset: 0B];
  _57 = VEC_PERM_EXPR <{ 0, 0, 0, 0, 0, 0, 0, 0 }, vect__4.15_49, { 0, 8, 9, 10, 11, 12, 13, 14 }>;
  _58 = vect__4.15_49 + _57;
  _59 = VEC_PERM_EXPR <{ 0, 0, 0, 0, 0, 0, 0, 0 }, _58, { 0, 1, 8, 9, 10, 11, 12, 13 }>;
  _60 = _58 + _59;
  _61 = VEC_PERM_EXPR <{ 0, 0, 0, 0, 0, 0, 0, 0 }, _60, { 0, 1, 2, 3, 8, 9, 10, 11 }>;
  _62 = _60 + _61;
  _63 = _62 + D__lsm.39_80;
  _64 = VEC_PERM_EXPR <_63, _63, { 7, 7, 7, 7, 7, 7, 7, 7 }>;
  MEM[base: b_32(D), index: ivtmp.27_45, offset: 0B] = _63;
  ivtmp.27_12 = ivtmp.27_45 + 32;
  if (ivtmp.27_12 != 4096)
    goto <bb 5>; [83.33%]
  else
    goto <bb 6>; [16.67%]
where the _57 ... _64 sequence is the implementation of the scan directive.

Does this look reasonable?

BTW, unfortunately SSE2 can't handle these permutations, probably I'll need
optionally some other sequence if they aren't supported (only SSE4 does).
In particular, what could be done is use whole vector shifts and
VEC_COND_EXPR to blend the neutral element in.


	Jakub

Comments

Richard Biener June 17, 2019, 6:35 a.m. UTC | #1
On Fri, 14 Jun 2019, Jakub Jelinek wrote:

> Hi!
> 
> OpenMP 5.0 introduced scan reductions, like:
>   #pragma omp simd reduction (inscan, +:r)
>   for (int i = 0; i < 1024; i++)
>     {
>       r += a[i];
>       #pragma omp scan inclusive(r)
>       b[i] = r;
>     }
> where there are 2 parts of code in each iteration, one which is supposed
> to compute the value for the privatized reduction variable (the private
> copy is initialized with a neutral element of the operation at the
> start of that part), and then the #pragma omp scan is supposed to
> change that private variable to include (in this case) inclusive partial
> sums.  E.g. PSTL we now have in libstdc++-v3/include/pstl/ makes use of
> these when available to implement std::*_scan.  It can be done also in
> worksharing loops, but I'll get to that later.
> 
> Anyway, the problem is that e.g. with OpenMP user defined reductions,
> the initializer and combiner of the reduction aren't simple operations
> during OpenMP lowering, it can be a method call or constructor call etc.,
> so we need something that preserves those initializer and combiner snippets
> in the IL for the vectorizer to be able to optimize them if they are
> simplified enough, on the other side it needs to be something that the
> normal optimizers are able to optimize and that actually works even when
> the vectorization isn't performed.
> 
> The following (incomplete, but far enough that for non-user defined
> reductions it handles the inclusive scan) patch handles that by using
> more magic, it adds variants to the .GOMP_SIMD_LANE builtin and uses those,
> the old one (0) in the user code, another variant (1) in the initializer
> and another variant (2) in the combiner pattern, which the vectorizer then
> needs to pattern recognize and either vectorize, or punt on vectorization.
> If it vectorizes it, it emits code like (optimized dump):
>   <bb 5> [local count: 708669599]:
>   # ivtmp.27_45 = PHI <0(4), ivtmp.27_12(5)>
>   # D__lsm.39_80 = PHI <D__lsm.39_47(4), _64(5)>
>   vect__4.15_49 = MEM[base: a_23(D), index: ivtmp.27_45, offset: 0B];
>   _57 = VEC_PERM_EXPR <{ 0, 0, 0, 0, 0, 0, 0, 0 }, vect__4.15_49, { 0, 8, 9, 10, 11, 12, 13, 14 }>;
>   _58 = vect__4.15_49 + _57;
>   _59 = VEC_PERM_EXPR <{ 0, 0, 0, 0, 0, 0, 0, 0 }, _58, { 0, 1, 8, 9, 10, 11, 12, 13 }>;
>   _60 = _58 + _59;
>   _61 = VEC_PERM_EXPR <{ 0, 0, 0, 0, 0, 0, 0, 0 }, _60, { 0, 1, 2, 3, 8, 9, 10, 11 }>;
>   _62 = _60 + _61;
>   _63 = _62 + D__lsm.39_80;
>   _64 = VEC_PERM_EXPR <_63, _63, { 7, 7, 7, 7, 7, 7, 7, 7 }>;
>   MEM[base: b_32(D), index: ivtmp.27_45, offset: 0B] = _63;
>   ivtmp.27_12 = ivtmp.27_45 + 32;
>   if (ivtmp.27_12 != 4096)
>     goto <bb 5>; [83.33%]
>   else
>     goto <bb 6>; [16.67%]
> where the _57 ... _64 sequence is the implementation of the scan directive.
> 
> Does this look reasonable?

Ugh, not pretty but probably best we can do.  Btw, can you please
add support for the SLP case and group_size == 1?  I know I'm slow
with the branch ripping out the non-SLP path but it would save me
some extra work (possibly).

Thanks,
Richard.

> BTW, unfortunately SSE2 can't handle these permutations, probably I'll need
> optionally some other sequence if they aren't supported (only SSE4 does).
> In particular, what could be done is use whole vector shifts and
> VEC_COND_EXPR to blend the neutral element in.
> 
> --- gcc/tree-vect-stmts.c.jj	2019-06-13 13:28:36.636155362 +0200
> +++ gcc/tree-vect-stmts.c	2019-06-14 19:05:18.150502242 +0200
> @@ -54,6 +54,7 @@ along with GCC; see the file COPYING3.
>  #include "tree-ssa-loop-niter.h"
>  #include "gimple-fold.h"
>  #include "regs.h"
> +#include "attribs.h"
>  
>  /* For lang_hooks.types.type_for_mode.  */
>  #include "langhooks.h"
> @@ -3257,7 +3258,7 @@ vectorizable_call (stmt_vec_info stmt_in
>    if (nargs == 0 || nargs > 4)
>      return false;
>  
> -  /* Ignore the argument of IFN_GOMP_SIMD_LANE, it is magic.  */
> +  /* Ignore the arguments of IFN_GOMP_SIMD_LANE, they are magic.  */
>    combined_fn cfn = gimple_call_combined_fn (stmt);
>    if (cfn == CFN_GOMP_SIMD_LANE)
>      {
> @@ -6320,6 +6321,456 @@ get_group_alias_ptr_type (stmt_vec_info
>  }
>  
>  
> +/* Function check_scan_store.
> +
> +   Check magic stores for #pragma omp scan {in,ex}clusive reductions.  */
> +
> +static bool
> +check_scan_store (stmt_vec_info stmt_info, tree vectype,
> +		  enum vect_def_type rhs_dt, bool slp, tree mask,
> +		  vect_memory_access_type memory_access_type)
> +{
> +  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
> +  dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_info);
> +  tree ref_type;
> +
> +  gcc_assert (STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) > 1);
> +  if (slp
> +      || mask
> +      || memory_access_type != VMAT_CONTIGUOUS
> +      || TREE_CODE (DR_BASE_ADDRESS (dr_info->dr)) != ADDR_EXPR
> +      || !VAR_P (TREE_OPERAND (DR_BASE_ADDRESS (dr_info->dr), 0))
> +      || loop_vinfo == NULL
> +      || LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
> +      || STMT_VINFO_GROUPED_ACCESS (stmt_info)
> +      || !integer_zerop (DR_OFFSET (dr_info->dr))
> +      || !integer_zerop (DR_INIT (dr_info->dr))
> +      || !(ref_type = reference_alias_ptr_type (DR_REF (dr_info->dr)))
> +      || !alias_sets_conflict_p (get_alias_set (vectype),
> +				 get_alias_set (TREE_TYPE (ref_type))))
> +    {
> +      if (dump_enabled_p ())
> +	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			 "unsupported OpenMP scan store.\n");
> +      return false;
> +    }
> +
> +  /* We need to pattern match code built by OpenMP lowering and simplified
> +     by following optimizations into something we can handle.
> +     #pragma omp simd reduction(inscan,+:r)
> +     for (...)
> +       {
> +	 r += something ();
> +	 #pragma omp scan inclusive (r)
> +	 use (r);
> +       }
> +     shall have body with:
> +       // Initialization for input phase, store the reduction initializer:
> +       _20 = .GOMP_SIMD_LANE (simduid.3_14(D), 0);
> +       _21 = .GOMP_SIMD_LANE (simduid.3_14(D), 1);
> +       D.2042[_21] = 0;
> +       // Actual input phase:
> +       ...
> +       r.0_5 = D.2042[_20];
> +       _6 = _4 + r.0_5;
> +       D.2042[_20] = _6;
> +       // Initialization for scan phase:
> +       _25 = .GOMP_SIMD_LANE (simduid.3_14(D), 2);
> +       _26 = D.2043[_25];
> +       _27 = D.2042[_25];
> +       _28 = _26 + _27;
> +       D.2043[_25] = _28;
> +       D.2042[_25] = _28;
> +       // Actual scan phase:
> +       ...
> +       r.1_8 = D.2042[_20];
> +       ...
> +     The "omp simd array" variable D.2042 holds the privatized copy used
> +     inside of the loop and D.2043 is another one that holds copies of
> +     the current original list item.  The separate GOMP_SIMD_LANE ifn
> +     kinds are there in order to allow optimizing the initializer store
> +     and combiner sequence, e.g. if it is originally some C++ish user
> +     defined reduction, but allow the vectorizer to pattern recognize it
> +     and turn into the appropriate vectorized scan.  */
> +
> +  if (STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) == 2)
> +    {
> +      /* Match the D.2042[_21] = 0; store above.  Just require that
> +	 it is a constant or external definition store.  */
> +      if (rhs_dt != vect_constant_def && rhs_dt != vect_external_def)
> +	{
> +	 fail_init:
> +	  if (dump_enabled_p ())
> +	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			     "unsupported OpenMP scan initializer store.\n");
> +	  return false;
> +	}
> +
> +      if (! loop_vinfo->scan_map)
> +	loop_vinfo->scan_map = new hash_map<tree, tree>;
> +      tree var = TREE_OPERAND (DR_BASE_ADDRESS (dr_info->dr), 0);
> +      tree &cached = loop_vinfo->scan_map->get_or_insert (var);
> +      if (cached)
> +	goto fail_init;
> +      cached = gimple_assign_rhs1 (STMT_VINFO_STMT (stmt_info));
> +
> +      /* These stores can be vectorized normally.  */
> +      return true;
> +    }
> +
> +  if (rhs_dt != vect_internal_def)
> +    {
> +     fail:
> +      if (dump_enabled_p ())
> +	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			 "unsupported OpenMP scan combiner pattern.\n");
> +      return false;
> +    }
> +
> +  gimple *stmt = STMT_VINFO_STMT (stmt_info);
> +  tree rhs = gimple_assign_rhs1 (stmt);
> +  if (TREE_CODE (rhs) != SSA_NAME)
> +    goto fail;
> +
> +  use_operand_p use_p;
> +  imm_use_iterator iter;
> +  gimple *other_store_stmt = NULL;
> +  FOR_EACH_IMM_USE_FAST (use_p, iter, rhs)
> +    {
> +      gimple *use_stmt = USE_STMT (use_p);
> +      if (use_stmt == stmt || is_gimple_debug (use_stmt))
> +	continue;
> +      if (gimple_bb (use_stmt) != gimple_bb (stmt)
> +	  || !gimple_store_p (use_stmt)
> +	  || other_store_stmt)
> +	goto fail;
> +      other_store_stmt = use_stmt;
> +    }
> +  if (other_store_stmt == NULL)
> +    goto fail;
> +  stmt_vec_info other_store_stmt_info
> +    = loop_vinfo->lookup_stmt (other_store_stmt);
> +  if (other_store_stmt_info == NULL
> +      || STMT_VINFO_SIMD_LANE_ACCESS_P (other_store_stmt_info) != 3)
> +    goto fail;
> +
> +  gimple *def_stmt = SSA_NAME_DEF_STMT (rhs);
> +  if (gimple_bb (def_stmt) != gimple_bb (stmt)
> +      || !is_gimple_assign (def_stmt)
> +      || gimple_assign_rhs_class (def_stmt) != GIMPLE_BINARY_RHS)
> +    goto fail;
> +
> +  enum tree_code code = gimple_assign_rhs_code (def_stmt);
> +  /* For pointer addition, we should use the normal plus for the vector
> +     operation.  */
> +  switch (code)
> +    {
> +    case POINTER_PLUS_EXPR:
> +      code = PLUS_EXPR;
> +      break;
> +    case MULT_HIGHPART_EXPR:
> +      goto fail;
> +    default:
> +      break;
> +    }
> +  if (TREE_CODE_LENGTH (code) != binary_op || !commutative_tree_code (code))
> +    goto fail;
> +
> +  tree rhs1 = gimple_assign_rhs1 (def_stmt);
> +  tree rhs2 = gimple_assign_rhs2 (def_stmt);
> +  if (TREE_CODE (rhs1) != SSA_NAME
> +      || TREE_CODE (rhs2) != SSA_NAME)
> +    goto fail;
> +
> +  gimple *load1_stmt = SSA_NAME_DEF_STMT (rhs1);
> +  gimple *load2_stmt = SSA_NAME_DEF_STMT (rhs2);
> +  if (gimple_bb (load1_stmt) != gimple_bb (stmt)
> +      || !gimple_assign_load_p (load1_stmt)
> +      || gimple_bb (load2_stmt) != gimple_bb (stmt)
> +      || !gimple_assign_load_p (load2_stmt))
> +    goto fail;
> +
> +  stmt_vec_info load1_stmt_info = loop_vinfo->lookup_stmt (load1_stmt);
> +  stmt_vec_info load2_stmt_info = loop_vinfo->lookup_stmt (load2_stmt);
> +  if (load1_stmt_info == NULL
> +      || load2_stmt_info == NULL
> +      || STMT_VINFO_SIMD_LANE_ACCESS_P (load1_stmt_info) != 3
> +      || STMT_VINFO_SIMD_LANE_ACCESS_P (load2_stmt_info) != 3)
> +    goto fail;
> +
> +  if (operand_equal_p (gimple_assign_lhs (stmt),
> +		       gimple_assign_rhs1 (load2_stmt), 0))
> +    {
> +      std::swap (rhs1, rhs2);
> +      std::swap (load1_stmt, load2_stmt);
> +      std::swap (load1_stmt_info, load2_stmt_info);
> +    }
> +  if (!operand_equal_p (gimple_assign_lhs (stmt),
> +			gimple_assign_rhs1 (load1_stmt), 0)
> +      || !operand_equal_p (gimple_assign_lhs (other_store_stmt),
> +			   gimple_assign_rhs1 (load2_stmt), 0))
> +    goto fail;
> +
> +  dr_vec_info *other_dr_info = STMT_VINFO_DR_INFO (other_store_stmt_info);
> +  if (TREE_CODE (DR_BASE_ADDRESS (other_dr_info->dr)) != ADDR_EXPR
> +      || !VAR_P (TREE_OPERAND (DR_BASE_ADDRESS (other_dr_info->dr), 0)))
> +    goto fail;
> +
> +  tree var1 = TREE_OPERAND (DR_BASE_ADDRESS (dr_info->dr), 0);
> +  tree var2 = TREE_OPERAND (DR_BASE_ADDRESS (other_dr_info->dr), 0);
> +  if (!lookup_attribute ("omp simd array", DECL_ATTRIBUTES (var1))
> +      || !lookup_attribute ("omp simd array", DECL_ATTRIBUTES (var2))
> +      || (!lookup_attribute ("omp simd inscan", DECL_ATTRIBUTES (var1)))
> +	 == (!lookup_attribute ("omp simd inscan", DECL_ATTRIBUTES (var2))))
> +    goto fail;
> +
> +  if (lookup_attribute ("omp simd inscan", DECL_ATTRIBUTES (var1)))
> +    std::swap (var1, var2);
> +
> +  if (loop_vinfo->scan_map == NULL)
> +    goto fail;
> +  tree *init = loop_vinfo->scan_map->get (var1);
> +  if (init == NULL)
> +    goto fail;
> +
> +  /* The IL is as expected, now check if we can actually vectorize it.
> +       _26 = D.2043[_25];
> +       _27 = D.2042[_25];
> +       _28 = _26 + _27;
> +       D.2043[_25] = _28;
> +       D.2042[_25] = _28;
> +     should be vectorized as (where _40 is the vectorized rhs
> +     from the D.2042[_21] = 0; store):
> +       _30 = MEM <vector(8) int> [(int *)&D.2043];
> +       _31 = MEM <vector(8) int> [(int *)&D.2042];
> +       _32 = VEC_PERM_EXPR <_31, _40, { 8, 0, 1, 2, 3, 4, 5, 6 }>;
> +       _33 = _31 + _32;
> +       // _33 = { _31[0], _31[0]+_31[1], _31[1]+_31[2], ..., _31[6]+_31[7] };
> +       _34 = VEC_PERM_EXPR <_33, _40, { 8, 9, 0, 1, 2, 3, 4, 5 }>;
> +       _35 = _33 + _34;
> +       // _35 = { _31[0], _31[0]+_31[1], _31[0]+.._31[2], _31[0]+.._31[3],
> +       //         _31[1]+.._31[4], ... _31[4]+.._31[7] };
> +       _36 = VEC_PERM_EXPR <_35, _40, { 8, 9, 10, 11, 0, 1, 2, 3 }>;
> +       _37 = _35 + _36;
> +       // _37 = { _31[0], _31[0]+_31[1], _31[0]+.._31[2], _31[0]+.._31[3],
> +       //         _31[0]+.._31[4], ... _31[0]+.._31[7] };
> +       _38 = _30 + _37;
> +       _39 = VEC_PERM_EXPR <_38, _38, { 7, 7, 7, 7, 7, 7, 7, 7 }>;
> +       MEM <vector(8) int> [(int *)&D.2043] = _39;
> +       MEM <vector(8) int> [(int *)&D.2042] = _38;  */
> +  enum machine_mode vec_mode = TYPE_MODE (vectype);
> +  optab optab = optab_for_tree_code (code, vectype, optab_default);
> +  if (!optab || optab_handler (optab, vec_mode) == CODE_FOR_nothing)
> +    goto fail;
> +
> +  unsigned HOST_WIDE_INT nunits;
> +  if (!TYPE_VECTOR_SUBPARTS (vectype).is_constant (&nunits))
> +    goto fail;
> +  int units_log2 = exact_log2 (nunits);
> +  if (units_log2 <= 0)
> +    goto fail;
> +
> +  for (int i = 0; i <= units_log2; ++i)
> +    {
> +      unsigned HOST_WIDE_INT j, k;
> +      vec_perm_builder sel (nunits, nunits, 1);
> +      sel.quick_grow (nunits);
> +      if (i == units_log2)
> +	{
> +	  for (j = 0; j < nunits; ++j)
> +	    sel[j] = nunits - 1;
> +	}
> +      else
> +	{
> +	  for (j = 0; j < (HOST_WIDE_INT_1U << i); ++j)
> +	    sel[j] = nunits + j;
> +	  for (k = 0; j < nunits; ++j, ++k)
> +	    sel[j] = k;
> +	}
> +      vec_perm_indices indices (sel, i == units_log2 ? 1 : 2, nunits);
> +      if (!can_vec_perm_const_p (vec_mode, indices))
> +	goto fail;
> +    }
> +
> +  return true;
> +}
> +
> +
> +/* Function vectorizable_scan_store.
> +
> +   Helper of vectorizable_score, arguments like on vectorizable_store.
> +   Handle only the transformation, checking is done in check_scan_store.  */
> +
> +static bool
> +vectorizable_scan_store (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
> +			 stmt_vec_info *vec_stmt, int ncopies)
> +{
> +  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
> +  dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_info);
> +  tree ref_type = reference_alias_ptr_type (DR_REF (dr_info->dr));
> +  vec_info *vinfo = stmt_info->vinfo;
> +  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> +
> +  if (dump_enabled_p ())
> +    dump_printf_loc (MSG_NOTE, vect_location,
> +		     "transform scan store. ncopies = %d\n", ncopies);
> +
> +  gimple *stmt = STMT_VINFO_STMT (stmt_info);
> +  tree rhs = gimple_assign_rhs1 (stmt);
> +  gcc_assert (TREE_CODE (rhs) == SSA_NAME);
> +
> +  gimple *def_stmt = SSA_NAME_DEF_STMT (rhs);
> +  enum tree_code code = gimple_assign_rhs_code (def_stmt);
> +  if (code == POINTER_PLUS_EXPR)
> +    code = PLUS_EXPR;
> +  gcc_assert (TREE_CODE_LENGTH (code) == binary_op
> +	      && commutative_tree_code (code));
> +  tree rhs1 = gimple_assign_rhs1 (def_stmt);
> +  tree rhs2 = gimple_assign_rhs2 (def_stmt);
> +  gcc_assert (TREE_CODE (rhs1) == SSA_NAME && TREE_CODE (rhs2) == SSA_NAME);
> +  gimple *load1_stmt = SSA_NAME_DEF_STMT (rhs1);
> +  gimple *load2_stmt = SSA_NAME_DEF_STMT (rhs2);
> +  stmt_vec_info load1_stmt_info = loop_vinfo->lookup_stmt (load1_stmt);
> +  stmt_vec_info load2_stmt_info = loop_vinfo->lookup_stmt (load2_stmt);
> +  dr_vec_info *load1_dr_info = STMT_VINFO_DR_INFO (load1_stmt_info);
> +  dr_vec_info *load2_dr_info = STMT_VINFO_DR_INFO (load2_stmt_info);
> +  tree var1 = TREE_OPERAND (DR_BASE_ADDRESS (load1_dr_info->dr), 0);
> +  tree var2 = TREE_OPERAND (DR_BASE_ADDRESS (load2_dr_info->dr), 0);
> +
> +  if (lookup_attribute ("omp simd inscan", DECL_ATTRIBUTES (var1)))
> +    {
> +      std::swap (rhs1, rhs2);
> +      std::swap (var1, var2);
> +    }
> +
> +  tree *init = loop_vinfo->scan_map->get (var1);
> +  gcc_assert (init);
> +
> +  tree var = TREE_OPERAND (DR_BASE_ADDRESS (dr_info->dr), 0);
> +  bool inscan_var_store
> +    = lookup_attribute ("omp simd inscan", DECL_ATTRIBUTES (var)) != NULL;
> +
> +  unsigned HOST_WIDE_INT nunits;
> +  if (!TYPE_VECTOR_SUBPARTS (vectype).is_constant (&nunits))
> +    gcc_unreachable ();
> +  int units_log2 = exact_log2 (nunits);
> +  gcc_assert (units_log2 > 0);
> +  auto_vec<tree, 16> perms;
> +  perms.quick_grow (units_log2 + 1);
> +  for (int i = 0; i <= units_log2; ++i)
> +    {
> +      unsigned HOST_WIDE_INT j, k;
> +      vec_perm_builder sel (nunits, nunits, 1);
> +      sel.quick_grow (nunits);
> +      if (i == units_log2)
> +	{
> +	  for (j = 0; j < nunits; ++j)
> +	    sel[j] = nunits - 1;
> +	}
> +      else
> +	{
> +	  for (j = 0; j < (HOST_WIDE_INT_1U << i); ++j)
> +	    sel[j] = nunits + j;
> +	  for (k = 0; j < nunits; ++j, ++k)
> +	    sel[j] = k;
> +	}
> +      vec_perm_indices indices (sel, i == units_log2 ? 1 : 2, nunits);
> +      perms[i] = vect_gen_perm_mask_checked (vectype, indices);
> +    }
> +
> +  stmt_vec_info prev_stmt_info = NULL;
> +  tree vec_oprnd1 = NULL_TREE;
> +  tree vec_oprnd2 = NULL_TREE;
> +  tree vec_oprnd3 = NULL_TREE;
> +  tree dataref_ptr = unshare_expr (DR_BASE_ADDRESS (dr_info->dr));
> +  tree dataref_offset = build_int_cst (ref_type, 0);
> +  tree bump = vect_get_data_ptr_increment (dr_info, vectype, VMAT_CONTIGUOUS);
> +  tree orig = NULL_TREE;
> +  for (int j = 0; j < ncopies; j++)
> +    {
> +      stmt_vec_info new_stmt_info;
> +      if (j == 0)
> +	{
> +	  vec_oprnd1 = vect_get_vec_def_for_operand (*init, stmt_info);
> +	  vec_oprnd2 = vect_get_vec_def_for_operand (rhs1, stmt_info);
> +	  vec_oprnd3 = vect_get_vec_def_for_operand (rhs2, stmt_info);
> +	  orig = vec_oprnd3;
> +	}
> +      else
> +	{
> +	  vec_oprnd1 = vect_get_vec_def_for_stmt_copy (vinfo, vec_oprnd1);
> +	  vec_oprnd2 = vect_get_vec_def_for_stmt_copy (vinfo, vec_oprnd2);
> +	  vec_oprnd3 = vect_get_vec_def_for_stmt_copy (vinfo, vec_oprnd3);
> +	  if (!inscan_var_store)
> +	    dataref_offset = int_const_binop (PLUS_EXPR, dataref_offset, bump);
> +	}
> +
> +      tree v = vec_oprnd2;
> +      for (int i = 0; i < units_log2; ++i)
> +	{
> +	  tree new_temp = make_ssa_name (vectype);
> +	  gimple *g = gimple_build_assign (new_temp, VEC_PERM_EXPR, v,
> +					   vec_oprnd1, perms[i]);
> +	  new_stmt_info = vect_finish_stmt_generation (stmt_info, g, gsi);
> +	  if (prev_stmt_info == NULL)
> +	    STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt_info;
> +	  else
> +	    STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt_info;
> +	  prev_stmt_info = new_stmt_info;
> +
> +	  tree new_temp2 = make_ssa_name (vectype);
> +	  g = gimple_build_assign (new_temp2, code, v, new_temp);
> +	  new_stmt_info = vect_finish_stmt_generation (stmt_info, g, gsi);
> +	  STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt_info;
> +	  prev_stmt_info = new_stmt_info;
> +
> +	  v = new_temp2;
> +	}
> +
> +      tree new_temp = make_ssa_name (vectype);
> +      gimple *g = gimple_build_assign (new_temp, code, orig, v);
> +      new_stmt_info = vect_finish_stmt_generation (stmt_info, g, gsi);
> +      STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt_info;
> +      prev_stmt_info = new_stmt_info;
> +
> +      orig = make_ssa_name (vectype);
> +      g = gimple_build_assign (orig, VEC_PERM_EXPR, new_temp, new_temp,
> +			       perms[units_log2]);
> +      new_stmt_info = vect_finish_stmt_generation (stmt_info, g, gsi);
> +      STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt_info;
> +      prev_stmt_info = new_stmt_info;
> +
> +      if (!inscan_var_store)
> +	{
> +	  tree data_ref = fold_build2 (MEM_REF, vectype, dataref_ptr,
> +				       dataref_offset);
> +	  vect_copy_ref_info (data_ref, DR_REF (dr_info->dr));
> +	  g = gimple_build_assign (data_ref, new_temp);
> +	  new_stmt_info = vect_finish_stmt_generation (stmt_info, g, gsi);
> +	  STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt_info;
> +	  prev_stmt_info = new_stmt_info;
> +	}
> +    }
> +
> +  if (inscan_var_store)
> +    for (int j = 0; j < ncopies; j++)
> +      {
> +	if (j != 0)
> +	  dataref_offset = int_const_binop (PLUS_EXPR, dataref_offset, bump);
> +
> +	tree data_ref = fold_build2 (MEM_REF, vectype, dataref_ptr,
> +				     dataref_offset);
> +	vect_copy_ref_info (data_ref, DR_REF (dr_info->dr));
> +	gimple *g = gimple_build_assign (data_ref, orig);
> +	stmt_vec_info new_stmt_info
> +	  = vect_finish_stmt_generation (stmt_info, g, gsi);
> +	STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt_info;
> +	prev_stmt_info = new_stmt_info;
> +      }
> +  return true;
> +}
> +
> +
>  /* Function vectorizable_store.
>  
>     Check if STMT_INFO defines a non scalar data-ref (array/pointer/structure)
> @@ -6514,6 +6965,13 @@ vectorizable_store (stmt_vec_info stmt_i
>        group_size = vec_num = 1;
>      }
>  
> +  if (STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) > 1 && !vec_stmt)
> +    {
> +      if (!check_scan_store (stmt_info, vectype, rhs_dt, slp, mask,
> +			     memory_access_type))
> +	return false;
> +    }
> +
>    if (!vec_stmt) /* transformation not required.  */
>      {
>        STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) = memory_access_type;
> @@ -6737,6 +7195,8 @@ vectorizable_store (stmt_vec_info stmt_i
>  	}
>        return true;
>      }
> +  else if (STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) == 3)
> +    return vectorizable_scan_store (stmt_info, gsi, vec_stmt, ncopies);
>  
>    if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
>      DR_GROUP_STORE_COUNT (DR_GROUP_FIRST_ELEMENT (stmt_info))++;
> @@ -7162,7 +7622,7 @@ vectorizable_store (stmt_vec_info stmt_i
>  	  gcc_assert (useless_type_conversion_p (vectype,
>  						 TREE_TYPE (vec_oprnd)));
>  	  bool simd_lane_access_p
> -	    = STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info);
> +	    = STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) != 0;
>  	  if (simd_lane_access_p
>  	      && !loop_masks
>  	      && TREE_CODE (DR_BASE_ADDRESS (first_dr_info->dr)) == ADDR_EXPR
> @@ -8347,7 +8807,7 @@ vectorizable_load (stmt_vec_info stmt_in
>        if (j == 0)
>  	{
>  	  bool simd_lane_access_p
> -	    = STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info);
> +	    = STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) != 0;
>  	  if (simd_lane_access_p
>  	      && TREE_CODE (DR_BASE_ADDRESS (first_dr_info->dr)) == ADDR_EXPR
>  	      && VAR_P (TREE_OPERAND (DR_BASE_ADDRESS (first_dr_info->dr), 0))
> --- gcc/tree-vect-data-refs.c.jj	2019-06-13 12:06:17.786472401 +0200
> +++ gcc/tree-vect-data-refs.c	2019-06-14 09:52:14.920718040 +0200
> @@ -3003,6 +3003,13 @@ vect_analyze_data_ref_accesses (vec_info
>  	      || TREE_CODE (DR_INIT (drb)) != INTEGER_CST)
>  	    break;
>  
> +	  /* Different .GOMP_SIMD_LANE calls still give the same lane,
> +	     just hold extra information.  */
> +	  if (STMT_VINFO_SIMD_LANE_ACCESS_P (stmtinfo_a)
> +	      && STMT_VINFO_SIMD_LANE_ACCESS_P (stmtinfo_b)
> +	      && data_ref_compare_tree (DR_INIT (dra), DR_INIT (drb)) == 0)
> +	    break;
> +
>  	  /* Sorting has ensured that DR_INIT (dra) <= DR_INIT (drb).  */
>  	  HOST_WIDE_INT init_a = TREE_INT_CST_LOW (DR_INIT (dra));
>  	  HOST_WIDE_INT init_b = TREE_INT_CST_LOW (DR_INIT (drb));
> @@ -4101,7 +4108,8 @@ vect_find_stmt_data_reference (loop_p lo
>  			  DR_STEP_ALIGNMENT (newdr)
>  			    = highest_pow2_factor (step);
>  			  /* Mark as simd-lane access.  */
> -			  newdr->aux = (void *)-1;
> +			  tree arg2 = gimple_call_arg (def, 1);
> +			  newdr->aux = (void *) (-1 - tree_to_uhwi (arg2));
>  			  free_data_ref (dr);
>  			  datarefs->safe_push (newdr);
>  			  return opt_result::success ();
> @@ -4210,14 +4218,17 @@ vect_analyze_data_refs (vec_info *vinfo,
>          }
>  
>        /* See if this was detected as SIMD lane access.  */
> -      if (dr->aux == (void *)-1)
> +      if (dr->aux == (void *)-1
> +	  || dr->aux == (void *)-2
> +	  || dr->aux == (void *)-3)
>  	{
>  	  if (nested_in_vect_loop_p (loop, stmt_info))
>  	    return opt_result::failure_at (stmt_info->stmt,
>  					   "not vectorized:"
>  					   " data ref analysis failed: %G",
>  					   stmt_info->stmt);
> -	  STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) = true;
> +	  STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info)
> +	    = -(uintptr_t) dr->aux;
>  	}
>  
>        tree base = get_base_address (DR_REF (dr));
> --- gcc/tree-vectorizer.h.jj	2019-06-13 12:50:31.597926603 +0200
> +++ gcc/tree-vectorizer.h	2019-06-14 16:51:53.155792356 +0200
> @@ -491,6 +491,10 @@ typedef struct _loop_vec_info : public v
>    /* Map of IV base/step expressions to inserted name in the preheader.  */
>    hash_map<tree_operand_hash, tree> *ivexpr_map;
>  
> +  /* Map of OpenMP "omp simd array" scan variables to corresponding
> +     rhs of the store of the initializer.  */
> +  hash_map<tree, tree> *scan_map;
> +
>    /* The unrolling factor needed to SLP the loop. In case of that pure SLP is
>       applied to the loop, i.e., no unrolling is needed, this is 1.  */
>    poly_uint64 slp_unrolling_factor;
> @@ -913,7 +917,7 @@ struct _stmt_vec_info {
>    bool strided_p;
>  
>    /* For both loads and stores.  */
> -  bool simd_lane_access_p;
> +  unsigned simd_lane_access_p : 2;
>  
>    /* Classifies how the load or store is going to be implemented
>       for loop vectorization.  */
> --- gcc/tree-ssa-dce.c.jj	2019-06-13 13:28:36.763153374 +0200
> +++ gcc/tree-ssa-dce.c	2019-06-13 14:20:14.889711910 +0200
> @@ -1339,14 +1339,14 @@ eliminate_unnecessary_stmts (void)
>  		  update_stmt (stmt);
>  		  release_ssa_name (name);
>  
> -		  /* GOMP_SIMD_LANE (unless two argument) or ASAN_POISON
> +		  /* GOMP_SIMD_LANE (unless three argument) or ASAN_POISON
>  		     without lhs is not needed.  */
>  		  if (gimple_call_internal_p (stmt))
>  		    switch (gimple_call_internal_fn (stmt))
>  		      {
>  		      case IFN_GOMP_SIMD_LANE:
> -			if (gimple_call_num_args (stmt) >= 2
> -			    && !integer_nonzerop (gimple_call_arg (stmt, 1)))
> +			if (gimple_call_num_args (stmt) >= 3
> +			    && !integer_nonzerop (gimple_call_arg (stmt, 2)))
>  			  break;
>  			/* FALLTHRU */
>  		      case IFN_ASAN_POISON:
> --- gcc/testsuite/gcc.dg/vect/vect-simd-8.c.jj	2019-06-14 19:00:40.918765225 +0200
> +++ gcc/testsuite/gcc.dg/vect/vect-simd-8.c	2019-06-14 19:01:43.755798987 +0200
> @@ -0,0 +1,66 @@
> +/* { dg-require-effective-target size32plus } */
> +/* { dg-additional-options "-fopenmp-simd" } */
> +
> +#include "tree-vect.h"
> +
> +int r, a[1024], b[1024];
> +
> +__attribute__((noipa)) void
> +foo (int *a, int *b)
> +{
> +  #pragma omp simd reduction (inscan, +:r)
> +  for (int i = 0; i < 1024; i++)
> +    {
> +      r += a[i];
> +      #pragma omp scan inclusive(r)
> +      b[i] = r;
> +    }
> +}
> +
> +__attribute__((noipa)) int
> +bar (void)
> +{
> +  int s = 0;
> +  #pragma omp simd reduction (inscan, +:s)
> +  for (int i = 0; i < 1024; i++)
> +    {
> +      s += 2 * a[i];
> +      #pragma omp scan inclusive(s)
> +      b[i] = s;
> +    }
> +  return s;
> +}
> +
> +int
> +main ()
> +{
> +  int s = 0;
> +  check_vect ();
> +  for (int i = 0; i < 1024; ++i)
> +    {
> +      a[i] = i;
> +      b[i] = -1;
> +      asm ("" : "+g" (i));
> +    }
> +  foo (a, b);
> +  if (r != 1024 * 1023 / 2)
> +    abort ();
> +  for (int i = 0; i < 1024; ++i)
> +    {
> +      s += i;
> +      if (b[i] != s)
> +	abort ();
> +      else
> +	b[i] = 25;
> +    }
> +  if (bar () != 1024 * 1023)
> +    abort ();
> +  s = 0;
> +  for (int i = 0; i < 1024; ++i)
> +    {
> +      s += 2 * i;
> +      if (b[i] != s)
> +	abort ();
> +    }
> +  return 0;
> +}
> --- gcc/omp-low.c.jj	2019-06-13 13:28:36.611155753 +0200
> +++ gcc/omp-low.c	2019-06-14 18:54:14.976699854 +0200
> @@ -141,6 +141,9 @@ struct omp_context
>    /* True if lower_omp_1 should look up lastprivate conditional in parent
>       context.  */
>    bool combined_into_simd_safelen0;
> +
> +  /* True if there is nested scan context with inclusive clause.  */
> +  bool scan_inclusive;
>  };
>  
>  static splay_tree all_contexts;
> @@ -3329,11 +3332,15 @@ scan_omp_1_stmt (gimple_stmt_iterator *g
>        scan_omp_single (as_a <gomp_single *> (stmt), ctx);
>        break;
>  
> +    case GIMPLE_OMP_SCAN:
> +      if (tree clauses = gimple_omp_scan_clauses (as_a <gomp_scan *> (stmt)))
> +	if (OMP_CLAUSE_CODE (clauses) == OMP_CLAUSE_INCLUSIVE)
> +	  ctx->scan_inclusive = true;
> +      /* FALLTHRU */
>      case GIMPLE_OMP_SECTION:
>      case GIMPLE_OMP_MASTER:
>      case GIMPLE_OMP_ORDERED:
>      case GIMPLE_OMP_CRITICAL:
> -    case GIMPLE_OMP_SCAN:
>      case GIMPLE_OMP_GRID_BODY:
>        ctx = new_omp_context (stmt, ctx);
>        scan_omp (gimple_omp_body_ptr (stmt), ctx);
> @@ -3671,6 +3678,7 @@ struct omplow_simd_context {
>    omplow_simd_context () { memset (this, 0, sizeof (*this)); }
>    tree idx;
>    tree lane;
> +  tree lastlane;
>    vec<tree, va_heap> simt_eargs;
>    gimple_seq simt_dlist;
>    poly_uint64_pod max_vf;
> @@ -3682,7 +3690,8 @@ struct omplow_simd_context {
>  
>  static bool
>  lower_rec_simd_input_clauses (tree new_var, omp_context *ctx,
> -			      omplow_simd_context *sctx, tree &ivar, tree &lvar)
> +			      omplow_simd_context *sctx, tree &ivar,
> +			      tree &lvar, tree *rvar = NULL)
>  {
>    if (known_eq (sctx->max_vf, 0U))
>      {
> @@ -3738,7 +3747,27 @@ lower_rec_simd_input_clauses (tree new_v
>  	= tree_cons (get_identifier ("omp simd array"), NULL,
>  		     DECL_ATTRIBUTES (avar));
>        gimple_add_tmp_var (avar);
> -      ivar = build4 (ARRAY_REF, TREE_TYPE (new_var), avar, sctx->idx,
> +      tree iavar = avar;
> +      if (rvar)
> +	{
> +	  /* For inscan reductions, create another array temporary,
> +	     which will hold the reduced value.  */
> +	  iavar = create_tmp_var_raw (atype);
> +	  if (TREE_ADDRESSABLE (new_var))
> +	    TREE_ADDRESSABLE (iavar) = 1;
> +	  DECL_ATTRIBUTES (iavar)
> +	    = tree_cons (get_identifier ("omp simd array"), NULL,
> +			 tree_cons (get_identifier ("omp simd inscan"), NULL,
> +				    DECL_ATTRIBUTES (iavar)));
> +	  gimple_add_tmp_var (iavar);
> +	  ctx->cb.decl_map->put (avar, iavar);
> +	  if (sctx->lastlane == NULL_TREE)
> +	    sctx->lastlane = create_tmp_var (unsigned_type_node);
> +	  *rvar = build4 (ARRAY_REF, TREE_TYPE (new_var), iavar,
> +			  sctx->lastlane, NULL_TREE, NULL_TREE);
> +	  TREE_THIS_NOTRAP (*rvar) = 1;
> +	}
> +      ivar = build4 (ARRAY_REF, TREE_TYPE (new_var), iavar, sctx->idx,
>  		     NULL_TREE, NULL_TREE);
>        lvar = build4 (ARRAY_REF, TREE_TYPE (new_var), avar, sctx->lane,
>  		     NULL_TREE, NULL_TREE);
> @@ -3814,7 +3843,7 @@ lower_rec_input_clauses (tree clauses, g
>    omplow_simd_context sctx = omplow_simd_context ();
>    tree simt_lane = NULL_TREE, simtrec = NULL_TREE;
>    tree ivar = NULL_TREE, lvar = NULL_TREE, uid = NULL_TREE;
> -  gimple_seq llist[3] = { };
> +  gimple_seq llist[4] = { };
>    tree nonconst_simd_if = NULL_TREE;
>  
>    copyin_seq = NULL;
> @@ -5324,12 +5353,32 @@ lower_rec_input_clauses (tree clauses, g
>  		      new_vard = TREE_OPERAND (new_var, 0);
>  		      gcc_assert (DECL_P (new_vard));
>  		    }
> +		  tree rvar = NULL_TREE, *rvarp = NULL;
> +		  if (is_simd
> +		      && OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION
> +		      && OMP_CLAUSE_REDUCTION_INSCAN (c))
> +		    rvarp = &rvar;
>  		  if (is_simd
>  		      && lower_rec_simd_input_clauses (new_var, ctx, &sctx,
> -						       ivar, lvar))
> +						       ivar, lvar, rvarp))
>  		    {
> +		      if (new_vard != new_var)
> +			{
> +			  SET_DECL_VALUE_EXPR (new_vard,
> +					       build_fold_addr_expr (lvar));
> +			  DECL_HAS_VALUE_EXPR_P (new_vard) = 1;
> +			}
> +
>  		      tree ref = build_outer_var_ref (var, ctx);
>  
> +		      if (rvarp)
> +			{
> +			  gimplify_assign (ivar, ref, &llist[0]);
> +			  ref = build_outer_var_ref (var, ctx);
> +			  gimplify_assign (ref, rvar, &llist[3]);
> +			  break;
> +			}
> +
>  		      gimplify_assign (unshare_expr (ivar), x, &llist[0]);
>  
>  		      if (sctx.is_simt)
> @@ -5346,12 +5395,6 @@ lower_rec_input_clauses (tree clauses, g
>  		      ref = build_outer_var_ref (var, ctx);
>  		      gimplify_assign (ref, x, &llist[1]);
>  
> -		      if (new_vard != new_var)
> -			{
> -			  SET_DECL_VALUE_EXPR (new_vard,
> -					       build_fold_addr_expr (lvar));
> -			  DECL_HAS_VALUE_EXPR_P (new_vard) = 1;
> -			}
>  		    }
>  		  else
>  		    {
> @@ -5456,14 +5499,23 @@ lower_rec_input_clauses (tree clauses, g
>    if (sctx.lane)
>      {
>        gimple *g = gimple_build_call_internal (IFN_GOMP_SIMD_LANE,
> -					      1 + (nonconst_simd_if != NULL),
> -					      uid, nonconst_simd_if);
> +					      2 + (nonconst_simd_if != NULL),
> +					      uid, integer_zero_node,
> +					      nonconst_simd_if);
>        gimple_call_set_lhs (g, sctx.lane);
>        gimple_stmt_iterator gsi = gsi_start_1 (gimple_omp_body_ptr (ctx->stmt));
>        gsi_insert_before_without_update (&gsi, g, GSI_SAME_STMT);
>        g = gimple_build_assign (sctx.lane, INTEGER_CST,
>  			       build_int_cst (unsigned_type_node, 0));
>        gimple_seq_add_stmt (ilist, g);
> +      if (sctx.lastlane)
> +	{
> +	  g = gimple_build_call_internal (IFN_GOMP_SIMD_LAST_LANE,
> +					  2, uid, sctx.lane);
> +	  gimple_call_set_lhs (g, sctx.lastlane);
> +	  gimple_seq_add_stmt (dlist, g);
> +	  gimple_seq_add_seq (dlist, llist[3]);
> +	}
>        /* Emit reductions across SIMT lanes in log_2(simt_vf) steps.  */
>        if (llist[2])
>  	{
> @@ -5865,6 +5917,7 @@ lower_lastprivate_clauses (tree clauses,
>  		  new_var = build4 (ARRAY_REF, TREE_TYPE (val),
>  				    TREE_OPERAND (val, 0), lastlane,
>  				    NULL_TREE, NULL_TREE);
> +		  TREE_THIS_NOTRAP (new_var) = 1;
>  		}
>  	    }
>  	  else if (maybe_simt)
> @@ -8371,6 +8424,108 @@ lower_omp_ordered (gimple_stmt_iterator
>  }
>  
>  
> +/* Expand code for an OpenMP scan directive and the structured block
> +   before the scan directive.  */
> +
> +static void
> +lower_omp_scan (gimple_stmt_iterator *gsi_p, omp_context *ctx)
> +{
> +  gimple *stmt = gsi_stmt (*gsi_p);
> +  bool has_clauses
> +    = gimple_omp_scan_clauses (as_a <gomp_scan *> (stmt)) != NULL;
> +  tree lane = NULL_TREE;
> +  gimple_seq before = NULL;
> +  omp_context *octx = ctx->outer;
> +  gcc_assert (octx);
> +  bool input_phase = has_clauses ^ octx->scan_inclusive;
> +  if (gimple_code (octx->stmt) == GIMPLE_OMP_FOR
> +      && (gimple_omp_for_kind (octx->stmt) & GF_OMP_FOR_SIMD)
> +      && !gimple_omp_for_combined_into_p (octx->stmt)
> +      && octx->scan_inclusive)
> +    {
> +      if (tree c = omp_find_clause (gimple_omp_for_clauses (octx->stmt),
> +				    OMP_CLAUSE__SIMDUID_))
> +	{
> +	  tree uid = OMP_CLAUSE__SIMDUID__DECL (c);
> +	  lane = create_tmp_var (unsigned_type_node);
> +	  tree t = build_int_cst (integer_type_node, 1 + !input_phase);
> +	  gimple *g
> +	    = gimple_build_call_internal (IFN_GOMP_SIMD_LANE, 2, uid, t);
> +	  gimple_call_set_lhs (g, lane);
> +	  gimple_seq_add_stmt (&before, g);
> +	}
> +      for (tree c = gimple_omp_for_clauses (octx->stmt);
> +	   c; c = OMP_CLAUSE_CHAIN (c))
> +	if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION
> +	    && OMP_CLAUSE_REDUCTION_INSCAN (c))
> +	  {
> +	    tree var = OMP_CLAUSE_DECL (c);
> +	    tree new_var = lookup_decl (var, octx);
> +	    tree val = new_var;
> +	    tree var2 = NULL_TREE;
> +	    if (DECL_HAS_VALUE_EXPR_P (new_var))
> +	      {
> +		val = DECL_VALUE_EXPR (new_var);
> +		if (TREE_CODE (val) == ARRAY_REF
> +		    && VAR_P (TREE_OPERAND (val, 0)))
> +		  {
> +		    tree v = TREE_OPERAND (val, 0);
> +		    if (lookup_attribute ("omp simd array",
> +					  DECL_ATTRIBUTES (v)))
> +		      {
> +			val = unshare_expr (val);
> +			TREE_OPERAND (val, 1) = lane;
> +			if (!input_phase)
> +			  {
> +			    var2 = lookup_decl (v, octx);
> +			    var2 = build4 (ARRAY_REF, TREE_TYPE (val),
> +					   var2, lane, NULL_TREE, NULL_TREE);
> +			    TREE_THIS_NOTRAP (var2) = 1;
> +			  }
> +			else
> +			  var2 = val;
> +		      }
> +		  }
> +	      }
> +	    if (var2 == NULL_TREE)
> +	      gcc_unreachable ();
> +	    if (OMP_CLAUSE_REDUCTION_PLACEHOLDER (c))
> +	      {
> +		gcc_unreachable ();
> +	      }
> +	    else
> +	      {
> +		if (input_phase)
> +		  {
> +		    /* input phase.  Set val to initializer before
> +		       the body.  */
> +		    tree x = omp_reduction_init (c, TREE_TYPE (new_var));
> +		    gimplify_assign (val, x, &before);
> +		  }
> +		else
> +		  {
> +		    /* scan phase.  */
> +		    enum tree_code code = OMP_CLAUSE_REDUCTION_CODE (c);
> +		    if (code == MINUS_EXPR)
> +		      code = PLUS_EXPR;
> +
> +		    tree x = build2 (code, TREE_TYPE (var2),
> +				     unshare_expr (var2), unshare_expr (val));
> +		    gimplify_assign (unshare_expr (var2), x, &before);
> +		    gimplify_assign (val, var2, &before);
> +		  }
> +	      }
> +	  }
> +    }
> +  else if (has_clauses)
> +    sorry_at (gimple_location (stmt),
> +	      "%<#pragma omp scan%> not supported yet");
> +  gsi_insert_seq_after (gsi_p, gimple_omp_body (stmt), GSI_SAME_STMT);
> +  gsi_insert_seq_after (gsi_p, before, GSI_SAME_STMT);
> +  gsi_replace (gsi_p, gimple_build_nop (), true);
> +}
> +
> +
>  /* Gimplify a GIMPLE_OMP_CRITICAL statement.  This is a relatively simple
>     substitution of a couple of function calls.  But in the NAMED case,
>     requires that languages coordinate a symbol name.  It is therefore
> @@ -10843,11 +10998,7 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p
>      case GIMPLE_OMP_SCAN:
>        ctx = maybe_lookup_ctx (stmt);
>        gcc_assert (ctx);
> -      gsi_insert_seq_after (gsi_p, gimple_omp_body (stmt), GSI_SAME_STMT);
> -      if (gimple_omp_scan_clauses (as_a <gomp_scan *> (stmt)))
> -	sorry_at (gimple_location (stmt),
> -		  "%<#pragma omp scan%> not supported yet");
> -      gsi_replace (gsi_p, gimple_build_nop (), true);
> +      lower_omp_scan (gsi_p, ctx);
>        break;
>      case GIMPLE_OMP_CRITICAL:
>        ctx = maybe_lookup_ctx (stmt);
> --- gcc/tree-vect-loop.c.jj	2019-06-13 13:28:36.581156223 +0200
> +++ gcc/tree-vect-loop.c	2019-06-14 14:53:10.734986707 +0200
> @@ -824,6 +824,7 @@ _loop_vec_info::_loop_vec_info (struct l
>      peeling_for_alignment (0),
>      ptr_mask (0),
>      ivexpr_map (NULL),
> +    scan_map (NULL),
>      slp_unrolling_factor (1),
>      single_scalar_iteration_cost (0),
>      vectorizable (false),
> @@ -863,8 +864,8 @@ _loop_vec_info::_loop_vec_info (struct l
>  	  gimple *stmt = gsi_stmt (si);
>  	  gimple_set_uid (stmt, 0);
>  	  add_stmt (stmt);
> -	  /* If .GOMP_SIMD_LANE call for the current loop has 2 arguments, the
> -	     second argument is the #pragma omp simd if (x) condition, when 0,
> +	  /* If .GOMP_SIMD_LANE call for the current loop has 3 arguments, the
> +	     third argument is the #pragma omp simd if (x) condition, when 0,
>  	     loop shouldn't be vectorized, when non-zero constant, it should
>  	     be vectorized normally, otherwise versioned with vectorized loop
>  	     done if the condition is non-zero at runtime.  */
> @@ -872,12 +873,12 @@ _loop_vec_info::_loop_vec_info (struct l
>  	      && is_gimple_call (stmt)
>  	      && gimple_call_internal_p (stmt)
>  	      && gimple_call_internal_fn (stmt) == IFN_GOMP_SIMD_LANE
> -	      && gimple_call_num_args (stmt) >= 2
> +	      && gimple_call_num_args (stmt) >= 3
>  	      && TREE_CODE (gimple_call_arg (stmt, 0)) == SSA_NAME
>  	      && (loop_in->simduid
>  		  == SSA_NAME_VAR (gimple_call_arg (stmt, 0))))
>  	    {
> -	      tree arg = gimple_call_arg (stmt, 1);
> +	      tree arg = gimple_call_arg (stmt, 2);
>  	      if (integer_zerop (arg) || TREE_CODE (arg) == SSA_NAME)
>  		simd_if_cond = arg;
>  	      else
> @@ -959,6 +960,7 @@ _loop_vec_info::~_loop_vec_info ()
>  
>    release_vec_loop_masks (&masks);
>    delete ivexpr_map;
> +  delete scan_map;
>  
>    loop->aux = NULL;
>  }
> 
> 	Jakub
> 
>
diff mbox series

Patch

--- gcc/tree-vect-stmts.c.jj	2019-06-13 13:28:36.636155362 +0200
+++ gcc/tree-vect-stmts.c	2019-06-14 19:05:18.150502242 +0200
@@ -54,6 +54,7 @@  along with GCC; see the file COPYING3.
 #include "tree-ssa-loop-niter.h"
 #include "gimple-fold.h"
 #include "regs.h"
+#include "attribs.h"
 
 /* For lang_hooks.types.type_for_mode.  */
 #include "langhooks.h"
@@ -3257,7 +3258,7 @@  vectorizable_call (stmt_vec_info stmt_in
   if (nargs == 0 || nargs > 4)
     return false;
 
-  /* Ignore the argument of IFN_GOMP_SIMD_LANE, it is magic.  */
+  /* Ignore the arguments of IFN_GOMP_SIMD_LANE, they are magic.  */
   combined_fn cfn = gimple_call_combined_fn (stmt);
   if (cfn == CFN_GOMP_SIMD_LANE)
     {
@@ -6320,6 +6321,456 @@  get_group_alias_ptr_type (stmt_vec_info
 }
 
 
+/* Function check_scan_store.
+
+   Check magic stores for #pragma omp scan {in,ex}clusive reductions.  */
+
+static bool
+check_scan_store (stmt_vec_info stmt_info, tree vectype,
+		  enum vect_def_type rhs_dt, bool slp, tree mask,
+		  vect_memory_access_type memory_access_type)
+{
+  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+  dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_info);
+  tree ref_type;
+
+  gcc_assert (STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) > 1);
+  if (slp
+      || mask
+      || memory_access_type != VMAT_CONTIGUOUS
+      || TREE_CODE (DR_BASE_ADDRESS (dr_info->dr)) != ADDR_EXPR
+      || !VAR_P (TREE_OPERAND (DR_BASE_ADDRESS (dr_info->dr), 0))
+      || loop_vinfo == NULL
+      || LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
+      || STMT_VINFO_GROUPED_ACCESS (stmt_info)
+      || !integer_zerop (DR_OFFSET (dr_info->dr))
+      || !integer_zerop (DR_INIT (dr_info->dr))
+      || !(ref_type = reference_alias_ptr_type (DR_REF (dr_info->dr)))
+      || !alias_sets_conflict_p (get_alias_set (vectype),
+				 get_alias_set (TREE_TYPE (ref_type))))
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			 "unsupported OpenMP scan store.\n");
+      return false;
+    }
+
+  /* We need to pattern match code built by OpenMP lowering and simplified
+     by following optimizations into something we can handle.
+     #pragma omp simd reduction(inscan,+:r)
+     for (...)
+       {
+	 r += something ();
+	 #pragma omp scan inclusive (r)
+	 use (r);
+       }
+     shall have body with:
+       // Initialization for input phase, store the reduction initializer:
+       _20 = .GOMP_SIMD_LANE (simduid.3_14(D), 0);
+       _21 = .GOMP_SIMD_LANE (simduid.3_14(D), 1);
+       D.2042[_21] = 0;
+       // Actual input phase:
+       ...
+       r.0_5 = D.2042[_20];
+       _6 = _4 + r.0_5;
+       D.2042[_20] = _6;
+       // Initialization for scan phase:
+       _25 = .GOMP_SIMD_LANE (simduid.3_14(D), 2);
+       _26 = D.2043[_25];
+       _27 = D.2042[_25];
+       _28 = _26 + _27;
+       D.2043[_25] = _28;
+       D.2042[_25] = _28;
+       // Actual scan phase:
+       ...
+       r.1_8 = D.2042[_20];
+       ...
+     The "omp simd array" variable D.2042 holds the privatized copy used
+     inside of the loop and D.2043 is another one that holds copies of
+     the current original list item.  The separate GOMP_SIMD_LANE ifn
+     kinds are there in order to allow optimizing the initializer store
+     and combiner sequence, e.g. if it is originally some C++ish user
+     defined reduction, but allow the vectorizer to pattern recognize it
+     and turn into the appropriate vectorized scan.  */
+
+  if (STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) == 2)
+    {
+      /* Match the D.2042[_21] = 0; store above.  Just require that
+	 it is a constant or external definition store.  */
+      if (rhs_dt != vect_constant_def && rhs_dt != vect_external_def)
+	{
+	 fail_init:
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			     "unsupported OpenMP scan initializer store.\n");
+	  return false;
+	}
+
+      if (! loop_vinfo->scan_map)
+	loop_vinfo->scan_map = new hash_map<tree, tree>;
+      tree var = TREE_OPERAND (DR_BASE_ADDRESS (dr_info->dr), 0);
+      tree &cached = loop_vinfo->scan_map->get_or_insert (var);
+      if (cached)
+	goto fail_init;
+      cached = gimple_assign_rhs1 (STMT_VINFO_STMT (stmt_info));
+
+      /* These stores can be vectorized normally.  */
+      return true;
+    }
+
+  if (rhs_dt != vect_internal_def)
+    {
+     fail:
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			 "unsupported OpenMP scan combiner pattern.\n");
+      return false;
+    }
+
+  gimple *stmt = STMT_VINFO_STMT (stmt_info);
+  tree rhs = gimple_assign_rhs1 (stmt);
+  if (TREE_CODE (rhs) != SSA_NAME)
+    goto fail;
+
+  use_operand_p use_p;
+  imm_use_iterator iter;
+  gimple *other_store_stmt = NULL;
+  FOR_EACH_IMM_USE_FAST (use_p, iter, rhs)
+    {
+      gimple *use_stmt = USE_STMT (use_p);
+      if (use_stmt == stmt || is_gimple_debug (use_stmt))
+	continue;
+      if (gimple_bb (use_stmt) != gimple_bb (stmt)
+	  || !gimple_store_p (use_stmt)
+	  || other_store_stmt)
+	goto fail;
+      other_store_stmt = use_stmt;
+    }
+  if (other_store_stmt == NULL)
+    goto fail;
+  stmt_vec_info other_store_stmt_info
+    = loop_vinfo->lookup_stmt (other_store_stmt);
+  if (other_store_stmt_info == NULL
+      || STMT_VINFO_SIMD_LANE_ACCESS_P (other_store_stmt_info) != 3)
+    goto fail;
+
+  gimple *def_stmt = SSA_NAME_DEF_STMT (rhs);
+  if (gimple_bb (def_stmt) != gimple_bb (stmt)
+      || !is_gimple_assign (def_stmt)
+      || gimple_assign_rhs_class (def_stmt) != GIMPLE_BINARY_RHS)
+    goto fail;
+
+  enum tree_code code = gimple_assign_rhs_code (def_stmt);
+  /* For pointer addition, we should use the normal plus for the vector
+     operation.  */
+  switch (code)
+    {
+    case POINTER_PLUS_EXPR:
+      code = PLUS_EXPR;
+      break;
+    case MULT_HIGHPART_EXPR:
+      goto fail;
+    default:
+      break;
+    }
+  if (TREE_CODE_LENGTH (code) != binary_op || !commutative_tree_code (code))
+    goto fail;
+
+  tree rhs1 = gimple_assign_rhs1 (def_stmt);
+  tree rhs2 = gimple_assign_rhs2 (def_stmt);
+  if (TREE_CODE (rhs1) != SSA_NAME
+      || TREE_CODE (rhs2) != SSA_NAME)
+    goto fail;
+
+  gimple *load1_stmt = SSA_NAME_DEF_STMT (rhs1);
+  gimple *load2_stmt = SSA_NAME_DEF_STMT (rhs2);
+  if (gimple_bb (load1_stmt) != gimple_bb (stmt)
+      || !gimple_assign_load_p (load1_stmt)
+      || gimple_bb (load2_stmt) != gimple_bb (stmt)
+      || !gimple_assign_load_p (load2_stmt))
+    goto fail;
+
+  stmt_vec_info load1_stmt_info = loop_vinfo->lookup_stmt (load1_stmt);
+  stmt_vec_info load2_stmt_info = loop_vinfo->lookup_stmt (load2_stmt);
+  if (load1_stmt_info == NULL
+      || load2_stmt_info == NULL
+      || STMT_VINFO_SIMD_LANE_ACCESS_P (load1_stmt_info) != 3
+      || STMT_VINFO_SIMD_LANE_ACCESS_P (load2_stmt_info) != 3)
+    goto fail;
+
+  if (operand_equal_p (gimple_assign_lhs (stmt),
+		       gimple_assign_rhs1 (load2_stmt), 0))
+    {
+      std::swap (rhs1, rhs2);
+      std::swap (load1_stmt, load2_stmt);
+      std::swap (load1_stmt_info, load2_stmt_info);
+    }
+  if (!operand_equal_p (gimple_assign_lhs (stmt),
+			gimple_assign_rhs1 (load1_stmt), 0)
+      || !operand_equal_p (gimple_assign_lhs (other_store_stmt),
+			   gimple_assign_rhs1 (load2_stmt), 0))
+    goto fail;
+
+  dr_vec_info *other_dr_info = STMT_VINFO_DR_INFO (other_store_stmt_info);
+  if (TREE_CODE (DR_BASE_ADDRESS (other_dr_info->dr)) != ADDR_EXPR
+      || !VAR_P (TREE_OPERAND (DR_BASE_ADDRESS (other_dr_info->dr), 0)))
+    goto fail;
+
+  tree var1 = TREE_OPERAND (DR_BASE_ADDRESS (dr_info->dr), 0);
+  tree var2 = TREE_OPERAND (DR_BASE_ADDRESS (other_dr_info->dr), 0);
+  if (!lookup_attribute ("omp simd array", DECL_ATTRIBUTES (var1))
+      || !lookup_attribute ("omp simd array", DECL_ATTRIBUTES (var2))
+      || (!lookup_attribute ("omp simd inscan", DECL_ATTRIBUTES (var1)))
+	 == (!lookup_attribute ("omp simd inscan", DECL_ATTRIBUTES (var2))))
+    goto fail;
+
+  if (lookup_attribute ("omp simd inscan", DECL_ATTRIBUTES (var1)))
+    std::swap (var1, var2);
+
+  if (loop_vinfo->scan_map == NULL)
+    goto fail;
+  tree *init = loop_vinfo->scan_map->get (var1);
+  if (init == NULL)
+    goto fail;
+
+  /* The IL is as expected, now check if we can actually vectorize it.
+       _26 = D.2043[_25];
+       _27 = D.2042[_25];
+       _28 = _26 + _27;
+       D.2043[_25] = _28;
+       D.2042[_25] = _28;
+     should be vectorized as (where _40 is the vectorized rhs
+     from the D.2042[_21] = 0; store):
+       _30 = MEM <vector(8) int> [(int *)&D.2043];
+       _31 = MEM <vector(8) int> [(int *)&D.2042];
+       _32 = VEC_PERM_EXPR <_31, _40, { 8, 0, 1, 2, 3, 4, 5, 6 }>;
+       _33 = _31 + _32;
+       // _33 = { _31[0], _31[0]+_31[1], _31[1]+_31[2], ..., _31[6]+_31[7] };
+       _34 = VEC_PERM_EXPR <_33, _40, { 8, 9, 0, 1, 2, 3, 4, 5 }>;
+       _35 = _33 + _34;
+       // _35 = { _31[0], _31[0]+_31[1], _31[0]+.._31[2], _31[0]+.._31[3],
+       //         _31[1]+.._31[4], ... _31[4]+.._31[7] };
+       _36 = VEC_PERM_EXPR <_35, _40, { 8, 9, 10, 11, 0, 1, 2, 3 }>;
+       _37 = _35 + _36;
+       // _37 = { _31[0], _31[0]+_31[1], _31[0]+.._31[2], _31[0]+.._31[3],
+       //         _31[0]+.._31[4], ... _31[0]+.._31[7] };
+       _38 = _30 + _37;
+       _39 = VEC_PERM_EXPR <_38, _38, { 7, 7, 7, 7, 7, 7, 7, 7 }>;
+       MEM <vector(8) int> [(int *)&D.2043] = _39;
+       MEM <vector(8) int> [(int *)&D.2042] = _38;  */
+  enum machine_mode vec_mode = TYPE_MODE (vectype);
+  optab optab = optab_for_tree_code (code, vectype, optab_default);
+  if (!optab || optab_handler (optab, vec_mode) == CODE_FOR_nothing)
+    goto fail;
+
+  unsigned HOST_WIDE_INT nunits;
+  if (!TYPE_VECTOR_SUBPARTS (vectype).is_constant (&nunits))
+    goto fail;
+  int units_log2 = exact_log2 (nunits);
+  if (units_log2 <= 0)
+    goto fail;
+
+  for (int i = 0; i <= units_log2; ++i)
+    {
+      unsigned HOST_WIDE_INT j, k;
+      vec_perm_builder sel (nunits, nunits, 1);
+      sel.quick_grow (nunits);
+      if (i == units_log2)
+	{
+	  for (j = 0; j < nunits; ++j)
+	    sel[j] = nunits - 1;
+	}
+      else
+	{
+	  for (j = 0; j < (HOST_WIDE_INT_1U << i); ++j)
+	    sel[j] = nunits + j;
+	  for (k = 0; j < nunits; ++j, ++k)
+	    sel[j] = k;
+	}
+      vec_perm_indices indices (sel, i == units_log2 ? 1 : 2, nunits);
+      if (!can_vec_perm_const_p (vec_mode, indices))
+	goto fail;
+    }
+
+  return true;
+}
+
+
+/* Function vectorizable_scan_store.
+
+   Helper of vectorizable_score, arguments like on vectorizable_store.
+   Handle only the transformation, checking is done in check_scan_store.  */
+
+static bool
+vectorizable_scan_store (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
+			 stmt_vec_info *vec_stmt, int ncopies)
+{
+  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+  dr_vec_info *dr_info = STMT_VINFO_DR_INFO (stmt_info);
+  tree ref_type = reference_alias_ptr_type (DR_REF (dr_info->dr));
+  vec_info *vinfo = stmt_info->vinfo;
+  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+
+  if (dump_enabled_p ())
+    dump_printf_loc (MSG_NOTE, vect_location,
+		     "transform scan store. ncopies = %d\n", ncopies);
+
+  gimple *stmt = STMT_VINFO_STMT (stmt_info);
+  tree rhs = gimple_assign_rhs1 (stmt);
+  gcc_assert (TREE_CODE (rhs) == SSA_NAME);
+
+  gimple *def_stmt = SSA_NAME_DEF_STMT (rhs);
+  enum tree_code code = gimple_assign_rhs_code (def_stmt);
+  if (code == POINTER_PLUS_EXPR)
+    code = PLUS_EXPR;
+  gcc_assert (TREE_CODE_LENGTH (code) == binary_op
+	      && commutative_tree_code (code));
+  tree rhs1 = gimple_assign_rhs1 (def_stmt);
+  tree rhs2 = gimple_assign_rhs2 (def_stmt);
+  gcc_assert (TREE_CODE (rhs1) == SSA_NAME && TREE_CODE (rhs2) == SSA_NAME);
+  gimple *load1_stmt = SSA_NAME_DEF_STMT (rhs1);
+  gimple *load2_stmt = SSA_NAME_DEF_STMT (rhs2);
+  stmt_vec_info load1_stmt_info = loop_vinfo->lookup_stmt (load1_stmt);
+  stmt_vec_info load2_stmt_info = loop_vinfo->lookup_stmt (load2_stmt);
+  dr_vec_info *load1_dr_info = STMT_VINFO_DR_INFO (load1_stmt_info);
+  dr_vec_info *load2_dr_info = STMT_VINFO_DR_INFO (load2_stmt_info);
+  tree var1 = TREE_OPERAND (DR_BASE_ADDRESS (load1_dr_info->dr), 0);
+  tree var2 = TREE_OPERAND (DR_BASE_ADDRESS (load2_dr_info->dr), 0);
+
+  if (lookup_attribute ("omp simd inscan", DECL_ATTRIBUTES (var1)))
+    {
+      std::swap (rhs1, rhs2);
+      std::swap (var1, var2);
+    }
+
+  tree *init = loop_vinfo->scan_map->get (var1);
+  gcc_assert (init);
+
+  tree var = TREE_OPERAND (DR_BASE_ADDRESS (dr_info->dr), 0);
+  bool inscan_var_store
+    = lookup_attribute ("omp simd inscan", DECL_ATTRIBUTES (var)) != NULL;
+
+  unsigned HOST_WIDE_INT nunits;
+  if (!TYPE_VECTOR_SUBPARTS (vectype).is_constant (&nunits))
+    gcc_unreachable ();
+  int units_log2 = exact_log2 (nunits);
+  gcc_assert (units_log2 > 0);
+  auto_vec<tree, 16> perms;
+  perms.quick_grow (units_log2 + 1);
+  for (int i = 0; i <= units_log2; ++i)
+    {
+      unsigned HOST_WIDE_INT j, k;
+      vec_perm_builder sel (nunits, nunits, 1);
+      sel.quick_grow (nunits);
+      if (i == units_log2)
+	{
+	  for (j = 0; j < nunits; ++j)
+	    sel[j] = nunits - 1;
+	}
+      else
+	{
+	  for (j = 0; j < (HOST_WIDE_INT_1U << i); ++j)
+	    sel[j] = nunits + j;
+	  for (k = 0; j < nunits; ++j, ++k)
+	    sel[j] = k;
+	}
+      vec_perm_indices indices (sel, i == units_log2 ? 1 : 2, nunits);
+      perms[i] = vect_gen_perm_mask_checked (vectype, indices);
+    }
+
+  stmt_vec_info prev_stmt_info = NULL;
+  tree vec_oprnd1 = NULL_TREE;
+  tree vec_oprnd2 = NULL_TREE;
+  tree vec_oprnd3 = NULL_TREE;
+  tree dataref_ptr = unshare_expr (DR_BASE_ADDRESS (dr_info->dr));
+  tree dataref_offset = build_int_cst (ref_type, 0);
+  tree bump = vect_get_data_ptr_increment (dr_info, vectype, VMAT_CONTIGUOUS);
+  tree orig = NULL_TREE;
+  for (int j = 0; j < ncopies; j++)
+    {
+      stmt_vec_info new_stmt_info;
+      if (j == 0)
+	{
+	  vec_oprnd1 = vect_get_vec_def_for_operand (*init, stmt_info);
+	  vec_oprnd2 = vect_get_vec_def_for_operand (rhs1, stmt_info);
+	  vec_oprnd3 = vect_get_vec_def_for_operand (rhs2, stmt_info);
+	  orig = vec_oprnd3;
+	}
+      else
+	{
+	  vec_oprnd1 = vect_get_vec_def_for_stmt_copy (vinfo, vec_oprnd1);
+	  vec_oprnd2 = vect_get_vec_def_for_stmt_copy (vinfo, vec_oprnd2);
+	  vec_oprnd3 = vect_get_vec_def_for_stmt_copy (vinfo, vec_oprnd3);
+	  if (!inscan_var_store)
+	    dataref_offset = int_const_binop (PLUS_EXPR, dataref_offset, bump);
+	}
+
+      tree v = vec_oprnd2;
+      for (int i = 0; i < units_log2; ++i)
+	{
+	  tree new_temp = make_ssa_name (vectype);
+	  gimple *g = gimple_build_assign (new_temp, VEC_PERM_EXPR, v,
+					   vec_oprnd1, perms[i]);
+	  new_stmt_info = vect_finish_stmt_generation (stmt_info, g, gsi);
+	  if (prev_stmt_info == NULL)
+	    STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt_info;
+	  else
+	    STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt_info;
+	  prev_stmt_info = new_stmt_info;
+
+	  tree new_temp2 = make_ssa_name (vectype);
+	  g = gimple_build_assign (new_temp2, code, v, new_temp);
+	  new_stmt_info = vect_finish_stmt_generation (stmt_info, g, gsi);
+	  STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt_info;
+	  prev_stmt_info = new_stmt_info;
+
+	  v = new_temp2;
+	}
+
+      tree new_temp = make_ssa_name (vectype);
+      gimple *g = gimple_build_assign (new_temp, code, orig, v);
+      new_stmt_info = vect_finish_stmt_generation (stmt_info, g, gsi);
+      STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt_info;
+      prev_stmt_info = new_stmt_info;
+
+      orig = make_ssa_name (vectype);
+      g = gimple_build_assign (orig, VEC_PERM_EXPR, new_temp, new_temp,
+			       perms[units_log2]);
+      new_stmt_info = vect_finish_stmt_generation (stmt_info, g, gsi);
+      STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt_info;
+      prev_stmt_info = new_stmt_info;
+
+      if (!inscan_var_store)
+	{
+	  tree data_ref = fold_build2 (MEM_REF, vectype, dataref_ptr,
+				       dataref_offset);
+	  vect_copy_ref_info (data_ref, DR_REF (dr_info->dr));
+	  g = gimple_build_assign (data_ref, new_temp);
+	  new_stmt_info = vect_finish_stmt_generation (stmt_info, g, gsi);
+	  STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt_info;
+	  prev_stmt_info = new_stmt_info;
+	}
+    }
+
+  if (inscan_var_store)
+    for (int j = 0; j < ncopies; j++)
+      {
+	if (j != 0)
+	  dataref_offset = int_const_binop (PLUS_EXPR, dataref_offset, bump);
+
+	tree data_ref = fold_build2 (MEM_REF, vectype, dataref_ptr,
+				     dataref_offset);
+	vect_copy_ref_info (data_ref, DR_REF (dr_info->dr));
+	gimple *g = gimple_build_assign (data_ref, orig);
+	stmt_vec_info new_stmt_info
+	  = vect_finish_stmt_generation (stmt_info, g, gsi);
+	STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt_info;
+	prev_stmt_info = new_stmt_info;
+      }
+  return true;
+}
+
+
 /* Function vectorizable_store.
 
    Check if STMT_INFO defines a non scalar data-ref (array/pointer/structure)
@@ -6514,6 +6965,13 @@  vectorizable_store (stmt_vec_info stmt_i
       group_size = vec_num = 1;
     }
 
+  if (STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) > 1 && !vec_stmt)
+    {
+      if (!check_scan_store (stmt_info, vectype, rhs_dt, slp, mask,
+			     memory_access_type))
+	return false;
+    }
+
   if (!vec_stmt) /* transformation not required.  */
     {
       STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) = memory_access_type;
@@ -6737,6 +7195,8 @@  vectorizable_store (stmt_vec_info stmt_i
 	}
       return true;
     }
+  else if (STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) == 3)
+    return vectorizable_scan_store (stmt_info, gsi, vec_stmt, ncopies);
 
   if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
     DR_GROUP_STORE_COUNT (DR_GROUP_FIRST_ELEMENT (stmt_info))++;
@@ -7162,7 +7622,7 @@  vectorizable_store (stmt_vec_info stmt_i
 	  gcc_assert (useless_type_conversion_p (vectype,
 						 TREE_TYPE (vec_oprnd)));
 	  bool simd_lane_access_p
-	    = STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info);
+	    = STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) != 0;
 	  if (simd_lane_access_p
 	      && !loop_masks
 	      && TREE_CODE (DR_BASE_ADDRESS (first_dr_info->dr)) == ADDR_EXPR
@@ -8347,7 +8807,7 @@  vectorizable_load (stmt_vec_info stmt_in
       if (j == 0)
 	{
 	  bool simd_lane_access_p
-	    = STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info);
+	    = STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) != 0;
 	  if (simd_lane_access_p
 	      && TREE_CODE (DR_BASE_ADDRESS (first_dr_info->dr)) == ADDR_EXPR
 	      && VAR_P (TREE_OPERAND (DR_BASE_ADDRESS (first_dr_info->dr), 0))
--- gcc/tree-vect-data-refs.c.jj	2019-06-13 12:06:17.786472401 +0200
+++ gcc/tree-vect-data-refs.c	2019-06-14 09:52:14.920718040 +0200
@@ -3003,6 +3003,13 @@  vect_analyze_data_ref_accesses (vec_info
 	      || TREE_CODE (DR_INIT (drb)) != INTEGER_CST)
 	    break;
 
+	  /* Different .GOMP_SIMD_LANE calls still give the same lane,
+	     just hold extra information.  */
+	  if (STMT_VINFO_SIMD_LANE_ACCESS_P (stmtinfo_a)
+	      && STMT_VINFO_SIMD_LANE_ACCESS_P (stmtinfo_b)
+	      && data_ref_compare_tree (DR_INIT (dra), DR_INIT (drb)) == 0)
+	    break;
+
 	  /* Sorting has ensured that DR_INIT (dra) <= DR_INIT (drb).  */
 	  HOST_WIDE_INT init_a = TREE_INT_CST_LOW (DR_INIT (dra));
 	  HOST_WIDE_INT init_b = TREE_INT_CST_LOW (DR_INIT (drb));
@@ -4101,7 +4108,8 @@  vect_find_stmt_data_reference (loop_p lo
 			  DR_STEP_ALIGNMENT (newdr)
 			    = highest_pow2_factor (step);
 			  /* Mark as simd-lane access.  */
-			  newdr->aux = (void *)-1;
+			  tree arg2 = gimple_call_arg (def, 1);
+			  newdr->aux = (void *) (-1 - tree_to_uhwi (arg2));
 			  free_data_ref (dr);
 			  datarefs->safe_push (newdr);
 			  return opt_result::success ();
@@ -4210,14 +4218,17 @@  vect_analyze_data_refs (vec_info *vinfo,
         }
 
       /* See if this was detected as SIMD lane access.  */
-      if (dr->aux == (void *)-1)
+      if (dr->aux == (void *)-1
+	  || dr->aux == (void *)-2
+	  || dr->aux == (void *)-3)
 	{
 	  if (nested_in_vect_loop_p (loop, stmt_info))
 	    return opt_result::failure_at (stmt_info->stmt,
 					   "not vectorized:"
 					   " data ref analysis failed: %G",
 					   stmt_info->stmt);
-	  STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info) = true;
+	  STMT_VINFO_SIMD_LANE_ACCESS_P (stmt_info)
+	    = -(uintptr_t) dr->aux;
 	}
 
       tree base = get_base_address (DR_REF (dr));
--- gcc/tree-vectorizer.h.jj	2019-06-13 12:50:31.597926603 +0200
+++ gcc/tree-vectorizer.h	2019-06-14 16:51:53.155792356 +0200
@@ -491,6 +491,10 @@  typedef struct _loop_vec_info : public v
   /* Map of IV base/step expressions to inserted name in the preheader.  */
   hash_map<tree_operand_hash, tree> *ivexpr_map;
 
+  /* Map of OpenMP "omp simd array" scan variables to corresponding
+     rhs of the store of the initializer.  */
+  hash_map<tree, tree> *scan_map;
+
   /* The unrolling factor needed to SLP the loop. In case of that pure SLP is
      applied to the loop, i.e., no unrolling is needed, this is 1.  */
   poly_uint64 slp_unrolling_factor;
@@ -913,7 +917,7 @@  struct _stmt_vec_info {
   bool strided_p;
 
   /* For both loads and stores.  */
-  bool simd_lane_access_p;
+  unsigned simd_lane_access_p : 2;
 
   /* Classifies how the load or store is going to be implemented
      for loop vectorization.  */
--- gcc/tree-ssa-dce.c.jj	2019-06-13 13:28:36.763153374 +0200
+++ gcc/tree-ssa-dce.c	2019-06-13 14:20:14.889711910 +0200
@@ -1339,14 +1339,14 @@  eliminate_unnecessary_stmts (void)
 		  update_stmt (stmt);
 		  release_ssa_name (name);
 
-		  /* GOMP_SIMD_LANE (unless two argument) or ASAN_POISON
+		  /* GOMP_SIMD_LANE (unless three argument) or ASAN_POISON
 		     without lhs is not needed.  */
 		  if (gimple_call_internal_p (stmt))
 		    switch (gimple_call_internal_fn (stmt))
 		      {
 		      case IFN_GOMP_SIMD_LANE:
-			if (gimple_call_num_args (stmt) >= 2
-			    && !integer_nonzerop (gimple_call_arg (stmt, 1)))
+			if (gimple_call_num_args (stmt) >= 3
+			    && !integer_nonzerop (gimple_call_arg (stmt, 2)))
 			  break;
 			/* FALLTHRU */
 		      case IFN_ASAN_POISON:
--- gcc/testsuite/gcc.dg/vect/vect-simd-8.c.jj	2019-06-14 19:00:40.918765225 +0200
+++ gcc/testsuite/gcc.dg/vect/vect-simd-8.c	2019-06-14 19:01:43.755798987 +0200
@@ -0,0 +1,66 @@ 
+/* { dg-require-effective-target size32plus } */
+/* { dg-additional-options "-fopenmp-simd" } */
+
+#include "tree-vect.h"
+
+int r, a[1024], b[1024];
+
+__attribute__((noipa)) void
+foo (int *a, int *b)
+{
+  #pragma omp simd reduction (inscan, +:r)
+  for (int i = 0; i < 1024; i++)
+    {
+      r += a[i];
+      #pragma omp scan inclusive(r)
+      b[i] = r;
+    }
+}
+
+__attribute__((noipa)) int
+bar (void)
+{
+  int s = 0;
+  #pragma omp simd reduction (inscan, +:s)
+  for (int i = 0; i < 1024; i++)
+    {
+      s += 2 * a[i];
+      #pragma omp scan inclusive(s)
+      b[i] = s;
+    }
+  return s;
+}
+
+int
+main ()
+{
+  int s = 0;
+  check_vect ();
+  for (int i = 0; i < 1024; ++i)
+    {
+      a[i] = i;
+      b[i] = -1;
+      asm ("" : "+g" (i));
+    }
+  foo (a, b);
+  if (r != 1024 * 1023 / 2)
+    abort ();
+  for (int i = 0; i < 1024; ++i)
+    {
+      s += i;
+      if (b[i] != s)
+	abort ();
+      else
+	b[i] = 25;
+    }
+  if (bar () != 1024 * 1023)
+    abort ();
+  s = 0;
+  for (int i = 0; i < 1024; ++i)
+    {
+      s += 2 * i;
+      if (b[i] != s)
+	abort ();
+    }
+  return 0;
+}
--- gcc/omp-low.c.jj	2019-06-13 13:28:36.611155753 +0200
+++ gcc/omp-low.c	2019-06-14 18:54:14.976699854 +0200
@@ -141,6 +141,9 @@  struct omp_context
   /* True if lower_omp_1 should look up lastprivate conditional in parent
      context.  */
   bool combined_into_simd_safelen0;
+
+  /* True if there is nested scan context with inclusive clause.  */
+  bool scan_inclusive;
 };
 
 static splay_tree all_contexts;
@@ -3329,11 +3332,15 @@  scan_omp_1_stmt (gimple_stmt_iterator *g
       scan_omp_single (as_a <gomp_single *> (stmt), ctx);
       break;
 
+    case GIMPLE_OMP_SCAN:
+      if (tree clauses = gimple_omp_scan_clauses (as_a <gomp_scan *> (stmt)))
+	if (OMP_CLAUSE_CODE (clauses) == OMP_CLAUSE_INCLUSIVE)
+	  ctx->scan_inclusive = true;
+      /* FALLTHRU */
     case GIMPLE_OMP_SECTION:
     case GIMPLE_OMP_MASTER:
     case GIMPLE_OMP_ORDERED:
     case GIMPLE_OMP_CRITICAL:
-    case GIMPLE_OMP_SCAN:
     case GIMPLE_OMP_GRID_BODY:
       ctx = new_omp_context (stmt, ctx);
       scan_omp (gimple_omp_body_ptr (stmt), ctx);
@@ -3671,6 +3678,7 @@  struct omplow_simd_context {
   omplow_simd_context () { memset (this, 0, sizeof (*this)); }
   tree idx;
   tree lane;
+  tree lastlane;
   vec<tree, va_heap> simt_eargs;
   gimple_seq simt_dlist;
   poly_uint64_pod max_vf;
@@ -3682,7 +3690,8 @@  struct omplow_simd_context {
 
 static bool
 lower_rec_simd_input_clauses (tree new_var, omp_context *ctx,
-			      omplow_simd_context *sctx, tree &ivar, tree &lvar)
+			      omplow_simd_context *sctx, tree &ivar,
+			      tree &lvar, tree *rvar = NULL)
 {
   if (known_eq (sctx->max_vf, 0U))
     {
@@ -3738,7 +3747,27 @@  lower_rec_simd_input_clauses (tree new_v
 	= tree_cons (get_identifier ("omp simd array"), NULL,
 		     DECL_ATTRIBUTES (avar));
       gimple_add_tmp_var (avar);
-      ivar = build4 (ARRAY_REF, TREE_TYPE (new_var), avar, sctx->idx,
+      tree iavar = avar;
+      if (rvar)
+	{
+	  /* For inscan reductions, create another array temporary,
+	     which will hold the reduced value.  */
+	  iavar = create_tmp_var_raw (atype);
+	  if (TREE_ADDRESSABLE (new_var))
+	    TREE_ADDRESSABLE (iavar) = 1;
+	  DECL_ATTRIBUTES (iavar)
+	    = tree_cons (get_identifier ("omp simd array"), NULL,
+			 tree_cons (get_identifier ("omp simd inscan"), NULL,
+				    DECL_ATTRIBUTES (iavar)));
+	  gimple_add_tmp_var (iavar);
+	  ctx->cb.decl_map->put (avar, iavar);
+	  if (sctx->lastlane == NULL_TREE)
+	    sctx->lastlane = create_tmp_var (unsigned_type_node);
+	  *rvar = build4 (ARRAY_REF, TREE_TYPE (new_var), iavar,
+			  sctx->lastlane, NULL_TREE, NULL_TREE);
+	  TREE_THIS_NOTRAP (*rvar) = 1;
+	}
+      ivar = build4 (ARRAY_REF, TREE_TYPE (new_var), iavar, sctx->idx,
 		     NULL_TREE, NULL_TREE);
       lvar = build4 (ARRAY_REF, TREE_TYPE (new_var), avar, sctx->lane,
 		     NULL_TREE, NULL_TREE);
@@ -3814,7 +3843,7 @@  lower_rec_input_clauses (tree clauses, g
   omplow_simd_context sctx = omplow_simd_context ();
   tree simt_lane = NULL_TREE, simtrec = NULL_TREE;
   tree ivar = NULL_TREE, lvar = NULL_TREE, uid = NULL_TREE;
-  gimple_seq llist[3] = { };
+  gimple_seq llist[4] = { };
   tree nonconst_simd_if = NULL_TREE;
 
   copyin_seq = NULL;
@@ -5324,12 +5353,32 @@  lower_rec_input_clauses (tree clauses, g
 		      new_vard = TREE_OPERAND (new_var, 0);
 		      gcc_assert (DECL_P (new_vard));
 		    }
+		  tree rvar = NULL_TREE, *rvarp = NULL;
+		  if (is_simd
+		      && OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION
+		      && OMP_CLAUSE_REDUCTION_INSCAN (c))
+		    rvarp = &rvar;
 		  if (is_simd
 		      && lower_rec_simd_input_clauses (new_var, ctx, &sctx,
-						       ivar, lvar))
+						       ivar, lvar, rvarp))
 		    {
+		      if (new_vard != new_var)
+			{
+			  SET_DECL_VALUE_EXPR (new_vard,
+					       build_fold_addr_expr (lvar));
+			  DECL_HAS_VALUE_EXPR_P (new_vard) = 1;
+			}
+
 		      tree ref = build_outer_var_ref (var, ctx);
 
+		      if (rvarp)
+			{
+			  gimplify_assign (ivar, ref, &llist[0]);
+			  ref = build_outer_var_ref (var, ctx);
+			  gimplify_assign (ref, rvar, &llist[3]);
+			  break;
+			}
+
 		      gimplify_assign (unshare_expr (ivar), x, &llist[0]);
 
 		      if (sctx.is_simt)
@@ -5346,12 +5395,6 @@  lower_rec_input_clauses (tree clauses, g
 		      ref = build_outer_var_ref (var, ctx);
 		      gimplify_assign (ref, x, &llist[1]);
 
-		      if (new_vard != new_var)
-			{
-			  SET_DECL_VALUE_EXPR (new_vard,
-					       build_fold_addr_expr (lvar));
-			  DECL_HAS_VALUE_EXPR_P (new_vard) = 1;
-			}
 		    }
 		  else
 		    {
@@ -5456,14 +5499,23 @@  lower_rec_input_clauses (tree clauses, g
   if (sctx.lane)
     {
       gimple *g = gimple_build_call_internal (IFN_GOMP_SIMD_LANE,
-					      1 + (nonconst_simd_if != NULL),
-					      uid, nonconst_simd_if);
+					      2 + (nonconst_simd_if != NULL),
+					      uid, integer_zero_node,
+					      nonconst_simd_if);
       gimple_call_set_lhs (g, sctx.lane);
       gimple_stmt_iterator gsi = gsi_start_1 (gimple_omp_body_ptr (ctx->stmt));
       gsi_insert_before_without_update (&gsi, g, GSI_SAME_STMT);
       g = gimple_build_assign (sctx.lane, INTEGER_CST,
 			       build_int_cst (unsigned_type_node, 0));
       gimple_seq_add_stmt (ilist, g);
+      if (sctx.lastlane)
+	{
+	  g = gimple_build_call_internal (IFN_GOMP_SIMD_LAST_LANE,
+					  2, uid, sctx.lane);
+	  gimple_call_set_lhs (g, sctx.lastlane);
+	  gimple_seq_add_stmt (dlist, g);
+	  gimple_seq_add_seq (dlist, llist[3]);
+	}
       /* Emit reductions across SIMT lanes in log_2(simt_vf) steps.  */
       if (llist[2])
 	{
@@ -5865,6 +5917,7 @@  lower_lastprivate_clauses (tree clauses,
 		  new_var = build4 (ARRAY_REF, TREE_TYPE (val),
 				    TREE_OPERAND (val, 0), lastlane,
 				    NULL_TREE, NULL_TREE);
+		  TREE_THIS_NOTRAP (new_var) = 1;
 		}
 	    }
 	  else if (maybe_simt)
@@ -8371,6 +8424,108 @@  lower_omp_ordered (gimple_stmt_iterator
 }
 
 
+/* Expand code for an OpenMP scan directive and the structured block
+   before the scan directive.  */
+
+static void
+lower_omp_scan (gimple_stmt_iterator *gsi_p, omp_context *ctx)
+{
+  gimple *stmt = gsi_stmt (*gsi_p);
+  bool has_clauses
+    = gimple_omp_scan_clauses (as_a <gomp_scan *> (stmt)) != NULL;
+  tree lane = NULL_TREE;
+  gimple_seq before = NULL;
+  omp_context *octx = ctx->outer;
+  gcc_assert (octx);
+  bool input_phase = has_clauses ^ octx->scan_inclusive;
+  if (gimple_code (octx->stmt) == GIMPLE_OMP_FOR
+      && (gimple_omp_for_kind (octx->stmt) & GF_OMP_FOR_SIMD)
+      && !gimple_omp_for_combined_into_p (octx->stmt)
+      && octx->scan_inclusive)
+    {
+      if (tree c = omp_find_clause (gimple_omp_for_clauses (octx->stmt),
+				    OMP_CLAUSE__SIMDUID_))
+	{
+	  tree uid = OMP_CLAUSE__SIMDUID__DECL (c);
+	  lane = create_tmp_var (unsigned_type_node);
+	  tree t = build_int_cst (integer_type_node, 1 + !input_phase);
+	  gimple *g
+	    = gimple_build_call_internal (IFN_GOMP_SIMD_LANE, 2, uid, t);
+	  gimple_call_set_lhs (g, lane);
+	  gimple_seq_add_stmt (&before, g);
+	}
+      for (tree c = gimple_omp_for_clauses (octx->stmt);
+	   c; c = OMP_CLAUSE_CHAIN (c))
+	if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION
+	    && OMP_CLAUSE_REDUCTION_INSCAN (c))
+	  {
+	    tree var = OMP_CLAUSE_DECL (c);
+	    tree new_var = lookup_decl (var, octx);
+	    tree val = new_var;
+	    tree var2 = NULL_TREE;
+	    if (DECL_HAS_VALUE_EXPR_P (new_var))
+	      {
+		val = DECL_VALUE_EXPR (new_var);
+		if (TREE_CODE (val) == ARRAY_REF
+		    && VAR_P (TREE_OPERAND (val, 0)))
+		  {
+		    tree v = TREE_OPERAND (val, 0);
+		    if (lookup_attribute ("omp simd array",
+					  DECL_ATTRIBUTES (v)))
+		      {
+			val = unshare_expr (val);
+			TREE_OPERAND (val, 1) = lane;
+			if (!input_phase)
+			  {
+			    var2 = lookup_decl (v, octx);
+			    var2 = build4 (ARRAY_REF, TREE_TYPE (val),
+					   var2, lane, NULL_TREE, NULL_TREE);
+			    TREE_THIS_NOTRAP (var2) = 1;
+			  }
+			else
+			  var2 = val;
+		      }
+		  }
+	      }
+	    if (var2 == NULL_TREE)
+	      gcc_unreachable ();
+	    if (OMP_CLAUSE_REDUCTION_PLACEHOLDER (c))
+	      {
+		gcc_unreachable ();
+	      }
+	    else
+	      {
+		if (input_phase)
+		  {
+		    /* input phase.  Set val to initializer before
+		       the body.  */
+		    tree x = omp_reduction_init (c, TREE_TYPE (new_var));
+		    gimplify_assign (val, x, &before);
+		  }
+		else
+		  {
+		    /* scan phase.  */
+		    enum tree_code code = OMP_CLAUSE_REDUCTION_CODE (c);
+		    if (code == MINUS_EXPR)
+		      code = PLUS_EXPR;
+
+		    tree x = build2 (code, TREE_TYPE (var2),
+				     unshare_expr (var2), unshare_expr (val));
+		    gimplify_assign (unshare_expr (var2), x, &before);
+		    gimplify_assign (val, var2, &before);
+		  }
+	      }
+	  }
+    }
+  else if (has_clauses)
+    sorry_at (gimple_location (stmt),
+	      "%<#pragma omp scan%> not supported yet");
+  gsi_insert_seq_after (gsi_p, gimple_omp_body (stmt), GSI_SAME_STMT);
+  gsi_insert_seq_after (gsi_p, before, GSI_SAME_STMT);
+  gsi_replace (gsi_p, gimple_build_nop (), true);
+}
+
+
 /* Gimplify a GIMPLE_OMP_CRITICAL statement.  This is a relatively simple
    substitution of a couple of function calls.  But in the NAMED case,
    requires that languages coordinate a symbol name.  It is therefore
@@ -10843,11 +10998,7 @@  lower_omp_1 (gimple_stmt_iterator *gsi_p
     case GIMPLE_OMP_SCAN:
       ctx = maybe_lookup_ctx (stmt);
       gcc_assert (ctx);
-      gsi_insert_seq_after (gsi_p, gimple_omp_body (stmt), GSI_SAME_STMT);
-      if (gimple_omp_scan_clauses (as_a <gomp_scan *> (stmt)))
-	sorry_at (gimple_location (stmt),
-		  "%<#pragma omp scan%> not supported yet");
-      gsi_replace (gsi_p, gimple_build_nop (), true);
+      lower_omp_scan (gsi_p, ctx);
       break;
     case GIMPLE_OMP_CRITICAL:
       ctx = maybe_lookup_ctx (stmt);
--- gcc/tree-vect-loop.c.jj	2019-06-13 13:28:36.581156223 +0200
+++ gcc/tree-vect-loop.c	2019-06-14 14:53:10.734986707 +0200
@@ -824,6 +824,7 @@  _loop_vec_info::_loop_vec_info (struct l
     peeling_for_alignment (0),
     ptr_mask (0),
     ivexpr_map (NULL),
+    scan_map (NULL),
     slp_unrolling_factor (1),
     single_scalar_iteration_cost (0),
     vectorizable (false),
@@ -863,8 +864,8 @@  _loop_vec_info::_loop_vec_info (struct l
 	  gimple *stmt = gsi_stmt (si);
 	  gimple_set_uid (stmt, 0);
 	  add_stmt (stmt);
-	  /* If .GOMP_SIMD_LANE call for the current loop has 2 arguments, the
-	     second argument is the #pragma omp simd if (x) condition, when 0,
+	  /* If .GOMP_SIMD_LANE call for the current loop has 3 arguments, the
+	     third argument is the #pragma omp simd if (x) condition, when 0,
 	     loop shouldn't be vectorized, when non-zero constant, it should
 	     be vectorized normally, otherwise versioned with vectorized loop
 	     done if the condition is non-zero at runtime.  */
@@ -872,12 +873,12 @@  _loop_vec_info::_loop_vec_info (struct l
 	      && is_gimple_call (stmt)
 	      && gimple_call_internal_p (stmt)
 	      && gimple_call_internal_fn (stmt) == IFN_GOMP_SIMD_LANE
-	      && gimple_call_num_args (stmt) >= 2
+	      && gimple_call_num_args (stmt) >= 3
 	      && TREE_CODE (gimple_call_arg (stmt, 0)) == SSA_NAME
 	      && (loop_in->simduid
 		  == SSA_NAME_VAR (gimple_call_arg (stmt, 0))))
 	    {
-	      tree arg = gimple_call_arg (stmt, 1);
+	      tree arg = gimple_call_arg (stmt, 2);
 	      if (integer_zerop (arg) || TREE_CODE (arg) == SSA_NAME)
 		simd_if_cond = arg;
 	      else
@@ -959,6 +960,7 @@  _loop_vec_info::~_loop_vec_info ()
 
   release_vec_loop_masks (&masks);
   delete ivexpr_map;
+  delete scan_map;
 
   loop->aux = NULL;
 }