diff mbox series

Extend store merging to STRING_CST

Message ID 1958860.vJQVT1cKh3@polaris
State New
Headers show
Series Extend store merging to STRING_CST | expand

Commit Message

Eric Botcazou July 1, 2020, 6:49 a.m. UTC
Hi,

the GIMPLE store merging pass doesn't merge STRING_CSTs in the general case, 
although they are accepted by native_encode_expr; the reason is that the pass 
only works with integral modes, i.e. with chunks whose size is a power of two.

There are two possible ways of extending it to handle STRING_CSTs: 1) lift the 
condition of integral modes and treat STRING_CSTs as other _CST nodes but with 
arbitrary size; 2) implement a specific and separate handling for STRING_CSTs.

The attached patch implements 2) for the following reasons: on the one hand, 
even in Ada where character strings are first-class citizens, cases where 
merging STRING_CSTs with other *_CST nodes would be possible are quite rare in 
practice; on the other hand, string concatenations happen more naturally and 
frequently thanks to the "&" operator, giving rise to merging opportunities.

These opportunites generally occur after inlining, when a strng argument 
passed in a subprogram call is a STRING_CST and is concatenated with other 
STRING_CSTs in the called subprogram.  In this case, the concatenation is 
originally expanded by means of a VLA and calls to BUILT_IN_MEMCPY on the 
array, and becomes of fixed length after inlining; in order to avoid having to 
deal with the calls to BUILT_IN_MEMCPY in the GIMPLE store merging pass, the 
patch also adds an optimization to gimple_fold_builtin_memory_op that turns 
calls to BUILT_IN_MEMCPY of STRING_CSTs into direct assignment of STRING_CSTs.
Unfortunately, doing this in the general case discombobulates the C-oriented 
string-related GIMPLE passes of the optimizer, so it is done only for the 
calls to BUILT_IN_MEMCPY that have been generated during gimplification for 
variable-sized assignments, which are thus marked with a new flag.

Tested on x86-64/Linux, OK for the mainline?


2020-07-01  Eric Botcazou  <ebotcazou@adacore.com>

	* gimple-fold.c (gimple_fold_builtin_memory_op): Fold calls that were
	initially created for the assignment of a variable-sized object.
	* gimple-ssa-store-merging.c (struct merged_store_group): Document
 	STRING_CST for rhs_code field.
	Add string_concatenation boolean field.
	(merged_store_group::merged_store_group): Initialize it as well as
	bit_insertion here.
	(merged_store_group::do_merge): Set it upon seeing a STRING_CST.  Also
	set bit_insertion here upon seeing a BIT_INSERT_EXPR.
	(merged_store_group::apply_stores): Clear it for small regions.  Do not
	create a power-of-2-sized buffer if it is still true.  And do not set
	bit_insertion here again.
	(encode_tree_to_bitpos): Deal with BLKmode for the expression.
	(merged_store_group::can_be_merged_into): Deal with STRING_CST.
	(imm_store_chain_info::coalesce_immediate_stores): Set bit_insertion
	to true after changing MEM_REF stores into BIT_INSERT_EXPR stores.
	(count_multiple_uses): Return 0 for STRING_CST.
	(split_group): Do not split the group for a string concatenation.
	(imm_store_chain_info::output_merged_store): Constify and rename some
	local variables.  Build an array type as destination type for a string
	concatenation, as well as a zero mask, and call build_string to build
	the source.
	(lhs_valid_for_store_merging_p): Return true for VIEW_CONVERT_EXPR.
	(pass_store_merging::process_store): Accept STRING_CST on the RHS.
	* gimple.h (gimple_call_alloca_for_var_p): New accessor function.
	* gimplify.c (gimplify_modify_expr_to_memcpy): Set alloca_for_var.
	* tree.h (CALL_ALLOCA_FOR_VAR_P): Document it for BUILT_IN_MEMCPY.


2020-07-01  Eric Botcazou  <ebotcazou@adacore.com>

	* gnat.dg/opt87.adb: New test.
	* gnat.dg/opt87_pkg.ad[sb]: New helper.

Comments

Richard Biener July 1, 2020, 7:47 a.m. UTC | #1
On Wed, Jul 1, 2020 at 8:50 AM Eric Botcazou <botcazou@adacore.com> wrote:
>
> Hi,
>
> the GIMPLE store merging pass doesn't merge STRING_CSTs in the general case,
> although they are accepted by native_encode_expr; the reason is that the pass
> only works with integral modes, i.e. with chunks whose size is a power of two.
>
> There are two possible ways of extending it to handle STRING_CSTs: 1) lift the
> condition of integral modes and treat STRING_CSTs as other _CST nodes but with
> arbitrary size; 2) implement a specific and separate handling for STRING_CSTs.
>
> The attached patch implements 2) for the following reasons: on the one hand,
> even in Ada where character strings are first-class citizens, cases where
> merging STRING_CSTs with other *_CST nodes would be possible are quite rare in
> practice; on the other hand, string concatenations happen more naturally and
> frequently thanks to the "&" operator, giving rise to merging opportunities.
>
> These opportunites generally occur after inlining, when a strng argument
> passed in a subprogram call is a STRING_CST and is concatenated with other
> STRING_CSTs in the called subprogram.  In this case, the concatenation is
> originally expanded by means of a VLA and calls to BUILT_IN_MEMCPY on the
> array, and becomes of fixed length after inlining; in order to avoid having to
> deal with the calls to BUILT_IN_MEMCPY in the GIMPLE store merging pass, the
> patch also adds an optimization to gimple_fold_builtin_memory_op that turns
> calls to BUILT_IN_MEMCPY of STRING_CSTs into direct assignment of STRING_CSTs.
> Unfortunately, doing this in the general case discombobulates the C-oriented
> string-related GIMPLE passes of the optimizer, so it is done only for the
> calls to BUILT_IN_MEMCPY that have been generated during gimplification for
> variable-sized assignments, which are thus marked with a new flag.
>
> Tested on x86-64/Linux, OK for the mainline?

+         && TREE_CODE (TREE_OPERAND (src, 0)) == STRING_CST
+         && tree_int_cst_equal
+            (TYPE_SIZE_UNIT (TREE_TYPE (TREE_OPERAND (src, 0))), len))
+       {

I guess we miss a BITS_PER_UNIT == 8 check here?

+         gimple_set_vuse (new_stmt, gimple_vuse (stmt));
+         gimple_set_vdef (new_stmt, gimple_vdef (stmt));
+         if (gimple_vdef (new_stmt)
+             && TREE_CODE (gimple_vdef (new_stmt)) == SSA_NAME)
+           SSA_NAME_DEF_STMT (gimple_vdef (new_stmt)) = new_stmt;

you can use gimple_move_vops (new_stmt, stmt) for this.

I wonder if there are objects beside STRING_CSTs that could have their
sizes fixed via inlining, thus, whether you can omit the == STRING_CST
check?  That is, I see this change independent of the store-merging
optimization.

Otherwise the memcpy folding part looks OK to me, I skimmed the
store-merging change and didn't find anything suspicious but I wonder
whether not special-casing STRING_CSTs would actually simplify
the code?

Thanks,
Richard.


>
> 2020-07-01  Eric Botcazou  <ebotcazou@adacore.com>
>
>         * gimple-fold.c (gimple_fold_builtin_memory_op): Fold calls that were
>         initially created for the assignment of a variable-sized object.
>         * gimple-ssa-store-merging.c (struct merged_store_group): Document
>         STRING_CST for rhs_code field.
>         Add string_concatenation boolean field.
>         (merged_store_group::merged_store_group): Initialize it as well as
>         bit_insertion here.
>         (merged_store_group::do_merge): Set it upon seeing a STRING_CST.  Also
>         set bit_insertion here upon seeing a BIT_INSERT_EXPR.
>         (merged_store_group::apply_stores): Clear it for small regions.  Do not
>         create a power-of-2-sized buffer if it is still true.  And do not set
>         bit_insertion here again.
>         (encode_tree_to_bitpos): Deal with BLKmode for the expression.
>         (merged_store_group::can_be_merged_into): Deal with STRING_CST.
>         (imm_store_chain_info::coalesce_immediate_stores): Set bit_insertion
>         to true after changing MEM_REF stores into BIT_INSERT_EXPR stores.
>         (count_multiple_uses): Return 0 for STRING_CST.
>         (split_group): Do not split the group for a string concatenation.
>         (imm_store_chain_info::output_merged_store): Constify and rename some
>         local variables.  Build an array type as destination type for a string
>         concatenation, as well as a zero mask, and call build_string to build
>         the source.
>         (lhs_valid_for_store_merging_p): Return true for VIEW_CONVERT_EXPR.
>         (pass_store_merging::process_store): Accept STRING_CST on the RHS.
>         * gimple.h (gimple_call_alloca_for_var_p): New accessor function.
>         * gimplify.c (gimplify_modify_expr_to_memcpy): Set alloca_for_var.
>         * tree.h (CALL_ALLOCA_FOR_VAR_P): Document it for BUILT_IN_MEMCPY.
>
>
> 2020-07-01  Eric Botcazou  <ebotcazou@adacore.com>
>
>         * gnat.dg/opt87.adb: New test.
>         * gnat.dg/opt87_pkg.ad[sb]: New helper.
>
> --
> Eric Botcazou
Eric Botcazou July 1, 2020, 10:32 a.m. UTC | #2
> +         && TREE_CODE (TREE_OPERAND (src, 0)) == STRING_CST
> +         && tree_int_cst_equal
> +            (TYPE_SIZE_UNIT (TREE_TYPE (TREE_OPERAND (src, 0))), len))
> +       {
> 
> I guess we miss a BITS_PER_UNIT == 8 check here?

OK, added.

> +         gimple_set_vuse (new_stmt, gimple_vuse (stmt));
> +         gimple_set_vdef (new_stmt, gimple_vdef (stmt));
> +         if (gimple_vdef (new_stmt)
> +             && TREE_CODE (gimple_vdef (new_stmt)) == SSA_NAME)
> +           SSA_NAME_DEF_STMT (gimple_vdef (new_stmt)) = new_stmt;
> 
> you can use gimple_move_vops (new_stmt, stmt) for this.

Indeed, changed.

> I wonder if there are objects beside STRING_CSTs that could have their
> sizes fixed via inlining, thus, whether you can omit the == STRING_CST
> check?  That is, I see this change independent of the store-merging
> optimization.

Will that not interfere with the subsequent cases though?  For STRING_CSTs, 
this is not the case since they are not var_decl_component_p.  I can replace 
STRING_CST with !var_decl_component_p but I'm not sure what we will gain.

> Otherwise the memcpy folding part looks OK to me, I skimmed the
> store-merging change and didn't find anything suspicious but I wonder
> whether not special-casing STRING_CSTs would actually simplify
> the code?

This would mean implementing the 1) though and, in particular, rejecting or 
splitting long strings, which is precisely what we do not want to do in Ada.
Richard Biener July 1, 2020, noon UTC | #3
On Wed, Jul 1, 2020 at 12:32 PM Eric Botcazou <botcazou@adacore.com> wrote:
>
> > +         && TREE_CODE (TREE_OPERAND (src, 0)) == STRING_CST
> > +         && tree_int_cst_equal
> > +            (TYPE_SIZE_UNIT (TREE_TYPE (TREE_OPERAND (src, 0))), len))
> > +       {
> >
> > I guess we miss a BITS_PER_UNIT == 8 check here?
>
> OK, added.
>
> > +         gimple_set_vuse (new_stmt, gimple_vuse (stmt));
> > +         gimple_set_vdef (new_stmt, gimple_vdef (stmt));
> > +         if (gimple_vdef (new_stmt)
> > +             && TREE_CODE (gimple_vdef (new_stmt)) == SSA_NAME)
> > +           SSA_NAME_DEF_STMT (gimple_vdef (new_stmt)) = new_stmt;
> >
> > you can use gimple_move_vops (new_stmt, stmt) for this.
>
> Indeed, changed.
>
> > I wonder if there are objects beside STRING_CSTs that could have their
> > sizes fixed via inlining, thus, whether you can omit the == STRING_CST
> > check?  That is, I see this change independent of the store-merging
> > optimization.
>
> Will that not interfere with the subsequent cases though?  For STRING_CSTs,
> this is not the case since they are not var_decl_component_p.  I can replace
> STRING_CST with !var_decl_component_p but I'm not sure what we will gain.

Hmm, that's a good question - so would your patch be replaceable by
simply amending var_decl_component_p by

  (var_decl_component_p (TREE_OPERAND (src, 0))
   || TREE_CODE (TREE_OPERAND (src, 0)) == STRING_CST)

?

> > Otherwise the memcpy folding part looks OK to me, I skimmed the
> > store-merging change and didn't find anything suspicious but I wonder
> > whether not special-casing STRING_CSTs would actually simplify
> > the code?
>
> This would mean implementing the 1) though and, in particular, rejecting or
> splitting long strings, which is precisely what we do not want to do in Ada.

Oh, indeed, yeah - so the separate handling looks better.  I thought
of cases like merging "x\0" with a subsequent short integer but those may
be already handled since they are power-of-two sized.

Richard.

> --
> Eric Botcazou
Eric Botcazou July 1, 2020, 9:05 p.m. UTC | #4
> Hmm, that's a good question - so would your patch be replaceable by
> simply amending var_decl_component_p by
> 
>   (var_decl_component_p (TREE_OPERAND (src, 0))
> 
>    || TREE_CODE (TREE_OPERAND (src, 0)) == STRING_CST)
> 
> ?

The gimple_call_alloca_for_var_p (stmt) kludge is still needed though and the 
end of the transformation needs to be adjusted, so I'm not sure it's better.
Richard Biener July 2, 2020, 8:40 a.m. UTC | #5
On Wed, Jul 1, 2020 at 11:06 PM Eric Botcazou <botcazou@adacore.com> wrote:
>
> > Hmm, that's a good question - so would your patch be replaceable by
> > simply amending var_decl_component_p by
> >
> >   (var_decl_component_p (TREE_OPERAND (src, 0))
> >
> >    || TREE_CODE (TREE_OPERAND (src, 0)) == STRING_CST)
> >
> > ?
>
> The gimple_call_alloca_for_var_p (stmt) kludge is still needed though and the
> end of the transformation needs to be adjusted, so I'm not sure it's better.

Sorry for the ping-pong but why's using a new char[len] type problematic?

That said, I do like p2 more even if we need to special-case STRING_CST
sources at the end for store-merging to be happy.

Thanks,
Richard.

> --
> Eric Botcazou
Eric Botcazou July 2, 2020, 10:27 a.m. UTC | #6
> Sorry for the ping-pong but why's using a new char[len] type problematic?

Because the type may be incompatible with that of the STRING_CST, so we would 
need to build a new STRING_CST with the new type.

> That said, I do like p2 more even if we need to special-case STRING_CST
> sources at the end for store-merging to be happy.

OK.
Richard Biener July 2, 2020, 10:39 a.m. UTC | #7
On Thu, Jul 2, 2020 at 12:27 PM Eric Botcazou <botcazou@adacore.com> wrote:
>
> > Sorry for the ping-pong but why's using a new char[len] type problematic?
>
> Because the type may be incompatible with that of the STRING_CST, so we would
> need to build a new STRING_CST with the new type.

I see.

> > That said, I do like p2 more even if we need to special-case STRING_CST
> > sources at the end for store-merging to be happy.
>
> OK.

So this variant combined with the rest of the patch is OK then.

Thanks,
Richard.

>
> --
> Eric Botcazou
Eric Botcazou July 2, 2020, 7:08 p.m. UTC | #8
> So this variant combined with the rest of the patch is OK then.

Thanks.  It occurred to me that using string_constant might be slightly better 
(iti is already used by gimple_fold_builtin_memchr in the same file).


	* gimple-fold.c (gimple_fold_builtin_memory_op): Fold calls that were
	initially created for the assignment of a variable-sized destination
	and whose source is now a string constant.
Richard Biener July 3, 2020, 11:22 a.m. UTC | #9
On Thu, Jul 2, 2020 at 9:08 PM Eric Botcazou <botcazou@adacore.com> wrote:
>
> > So this variant combined with the rest of the patch is OK then.
>
> Thanks.  It occurred to me that using string_constant might be slightly better
> (iti is already used by gimple_fold_builtin_memchr in the same file).

OK.

Thanks,
Richard.

>
>         * gimple-fold.c (gimple_fold_builtin_memory_op): Fold calls that were
>         initially created for the assignment of a variable-sized destination
>         and whose source is now a string constant.
>
> --
> Eric Botcazou
diff mbox series

Patch

diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index 4e3de95d2d2..9a467e80aae 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -840,6 +840,35 @@  gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi,
 	    }
 	}
 
+      /* If this was a variable-sized copy operation that has been turned
+	 into a fixed-size string store, typically after inlining, then do
+	 the fixed-size store explicitly.  We might be able to concatenate
+	 several of them later into a single string store.  */
+      if (tree_fits_uhwi_p (len)
+	  && gimple_call_alloca_for_var_p (stmt)
+	  && TREE_CODE (src) == ADDR_EXPR
+	  && TREE_CODE (TREE_OPERAND (src, 0)) == STRING_CST
+	  && tree_int_cst_equal
+	     (TYPE_SIZE_UNIT (TREE_TYPE (TREE_OPERAND (src, 0))), len))
+	{
+	  tree new_src = TREE_OPERAND (src, 0);
+	  tree new_dest
+	    = fold_build2 (MEM_REF, TREE_TYPE (new_src), dest, off0);
+	  gimple *new_stmt = gimple_build_assign (new_dest, new_src);
+	  gimple_set_vuse (new_stmt, gimple_vuse (stmt));
+	  gimple_set_vdef (new_stmt, gimple_vdef (stmt));
+	  if (gimple_vdef (new_stmt)
+	      && TREE_CODE (gimple_vdef (new_stmt)) == SSA_NAME)
+	    SSA_NAME_DEF_STMT (gimple_vdef (new_stmt)) = new_stmt;
+	  if (!lhs)
+	    {
+	      gsi_replace (gsi, new_stmt, false);
+	      return true;
+	    }
+	  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+	  goto done;
+	}
+
       if (code == BUILT_IN_MEMMOVE)
 	{
 	  /* Both DEST and SRC must be pointer types.
diff --git a/gcc/gimple-ssa-store-merging.c b/gcc/gimple-ssa-store-merging.c
index 3ab614148a7..bac6dd05e3c 100644
--- a/gcc/gimple-ssa-store-merging.c
+++ b/gcc/gimple-ssa-store-merging.c
@@ -1362,9 +1362,9 @@  public:
   unsigned HOST_WIDE_INT bitregion_end;
   gimple *stmt;
   unsigned int order;
-  /* INTEGER_CST for constant stores, MEM_REF for memory copy,
-     BIT_*_EXPR for logical bitwise operation, BIT_INSERT_EXPR
-     for bit insertion.
+  /* INTEGER_CST for constant store, STRING_CST for string store,
+     MEM_REF for memory copy, BIT_*_EXPR for logical bitwise operation,
+     BIT_INSERT_EXPR for bit insertion.
      LROTATE_EXPR if it can be only bswap optimized and
      ops are not really meaningful.
      NOP_EXPR if bswap optimization detected identity, ops
@@ -1440,6 +1440,7 @@  public:
   unsigned int first_order;
   unsigned int last_order;
   bool bit_insertion;
+  bool string_concatenation;
   bool only_constants;
   unsigned int first_nonmergeable_order;
   int lp_nr;
@@ -1650,7 +1651,10 @@  encode_tree_to_bitpos (tree expr, unsigned char *ptr, int bitlen, int bitpos,
     {
       fixed_size_mode mode
 	= as_a <fixed_size_mode> (TYPE_MODE (TREE_TYPE (expr)));
-      byte_size = GET_MODE_SIZE (mode);
+      byte_size
+	= mode == BLKmode
+	? tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (expr)))
+	: GET_MODE_SIZE (mode);
     }
   /* Allocate an extra byte so that we have space to shift into.  */
   byte_size++;
@@ -1807,7 +1811,8 @@  merged_store_group::merged_store_group (store_immediate_info *info)
      width has been finalized.  */
   val = NULL;
   mask = NULL;
-  bit_insertion = false;
+  bit_insertion = info->rhs_code == BIT_INSERT_EXPR;
+  string_concatenation = info->rhs_code == STRING_CST;
   only_constants = info->rhs_code == INTEGER_CST;
   first_nonmergeable_order = ~0U;
   lp_nr = info->lp_nr;
@@ -1860,12 +1865,12 @@  merged_store_group::can_be_merged_into (store_immediate_info *info)
   if (info->rhs_code == stores[0]->rhs_code)
     return true;
 
-  /* BIT_INSERT_EXPR is compatible with INTEGER_CST.  */
+  /* BIT_INSERT_EXPR is compatible with INTEGER_CST if no STRING_CST.  */
   if (info->rhs_code == BIT_INSERT_EXPR && stores[0]->rhs_code == INTEGER_CST)
-    return true;
+    return !string_concatenation;
 
   if (stores[0]->rhs_code == BIT_INSERT_EXPR && info->rhs_code == INTEGER_CST)
-    return true;
+    return !string_concatenation;
 
   /* We can turn MEM_REF into BIT_INSERT_EXPR for bit-field stores, but do it
      only for small regions since this can generate a lot of instructions.  */
@@ -1875,7 +1880,7 @@  merged_store_group::can_be_merged_into (store_immediate_info *info)
       && info->bitregion_start == stores[0]->bitregion_start
       && info->bitregion_end == stores[0]->bitregion_end
       && info->bitregion_end - info->bitregion_start <= MAX_FIXED_MODE_SIZE)
-    return true;
+    return !string_concatenation;
 
   if (stores[0]->rhs_code == MEM_REF
       && (info->rhs_code == INTEGER_CST
@@ -1883,7 +1888,18 @@  merged_store_group::can_be_merged_into (store_immediate_info *info)
       && info->bitregion_start == stores[0]->bitregion_start
       && info->bitregion_end == stores[0]->bitregion_end
       && info->bitregion_end - info->bitregion_start <= MAX_FIXED_MODE_SIZE)
-    return true;
+    return !string_concatenation;
+
+  /* STRING_CST is compatible with INTEGER_CST if no BIT_INSERT_EXPR.  */
+  if (info->rhs_code == STRING_CST
+      && stores[0]->rhs_code == INTEGER_CST
+      && stores[0]->bitsize == CHAR_BIT)
+    return !bit_insertion;
+
+  if (stores[0]->rhs_code == STRING_CST
+      && info->rhs_code == INTEGER_CST
+      && info->bitsize == CHAR_BIT)
+    return !bit_insertion;
 
   return false;
 }
@@ -1932,6 +1948,21 @@  merged_store_group::do_merge (store_immediate_info *info)
       first_order = info->order;
       first_stmt = stmt;
     }
+
+  /* We need to use extraction if there is any bit-field.  */
+  if (info->rhs_code == BIT_INSERT_EXPR)
+    {
+      bit_insertion = true;
+      gcc_assert (!string_concatenation);
+    }
+
+  /* We need to use concatenation if there is any string.  */
+  if (info->rhs_code == STRING_CST)
+    {
+      string_concatenation = true;
+      gcc_assert (!bit_insertion);
+    }
+
   if (info->rhs_code != INTEGER_CST)
     only_constants = false;
 }
@@ -1972,6 +2003,9 @@  merged_store_group::merge_overlapping (store_immediate_info *info)
 bool
 merged_store_group::apply_stores ()
 {
+  store_immediate_info *info;
+  unsigned int i;
+
   /* Make sure we have more than one store in the group, otherwise we cannot
      merge anything.  */
   if (bitregion_start % BITS_PER_UNIT != 0
@@ -1979,16 +2013,23 @@  merged_store_group::apply_stores ()
       || stores.length () == 1)
     return false;
 
-  stores.qsort (sort_by_order);
-  store_immediate_info *info;
-  unsigned int i;
+  buf_size = (bitregion_end - bitregion_start) / BITS_PER_UNIT;
+
+  /* Really do string concatenation for large strings only.  */
+  if (buf_size <= MOVE_MAX)
+    string_concatenation = false;
+
   /* Create a power-of-2-sized buffer for native_encode_expr.  */
-  buf_size = 1 << ceil_log2 ((bitregion_end - bitregion_start) / BITS_PER_UNIT);
+  if (!string_concatenation)
+    buf_size = 1 << ceil_log2 (buf_size);
+
   val = XNEWVEC (unsigned char, 2 * buf_size);
   mask = val + buf_size;
   memset (val, 0, buf_size);
   memset (mask, ~0U, buf_size);
 
+  stores.qsort (sort_by_order);
+
   FOR_EACH_VEC_ELT (stores, i, info)
     {
       unsigned int pos_in_buffer = info->bitpos - bitregion_start;
@@ -2000,14 +2041,9 @@  merged_store_group::apply_stores ()
       else
 	cst = NULL_TREE;
       bool ret = true;
-      if (cst)
-	{
-	  if (info->rhs_code == BIT_INSERT_EXPR)
-	    bit_insertion = true;
-	  else
-	    ret = encode_tree_to_bitpos (cst, val, info->bitsize,
-					 pos_in_buffer, buf_size);
-	}
+      if (cst && info->rhs_code != BIT_INSERT_EXPR)
+	ret = encode_tree_to_bitpos (cst, val, info->bitsize, pos_in_buffer,
+				     buf_size);
       unsigned char *m = mask + (pos_in_buffer / BITS_PER_UNIT);
       if (BYTES_BIG_ENDIAN)
 	clear_bit_region_be (m, (BITS_PER_UNIT - 1
@@ -2029,6 +2065,8 @@  merged_store_group::apply_stores ()
 	      dump_char_array (dump_file, mask, buf_size);
 	      if (bit_insertion)
 		fputs ("  bit insertion is required\n", dump_file);
+	      if (string_concatenation)
+		fputs ("  string concatenation is required\n", dump_file);
 	    }
 	  else
 	    fprintf (dump_file, "Failed to merge stores\n");
@@ -2916,6 +2954,7 @@  imm_store_chain_info::coalesce_immediate_stores ()
 		      infoj->ops[0].val = gimple_assign_rhs1 (infoj->stmt);
 		      infoj->ops[0].base_addr = NULL_TREE;
 		    }
+		  merged_store->bit_insertion = true;
 		}
 	      if ((infof->ops[0].base_addr
 		   ? compatible_load_p (merged_store, info, base_addr, 0)
@@ -3143,6 +3182,7 @@  count_multiple_uses (store_immediate_info *info)
   switch (info->rhs_code)
     {
     case INTEGER_CST:
+    case STRING_CST:
       return 0;
     case BIT_AND_EXPR:
     case BIT_IOR_EXPR:
@@ -3238,13 +3278,14 @@  split_group (merged_store_group *group, bool allow_unaligned_store,
 
   gcc_assert ((size % BITS_PER_UNIT == 0) && (pos % BITS_PER_UNIT == 0));
 
+  /* For bswap framework using sets of stores, all the checking has been done
+     earlier in try_coalesce_bswap and the result always needs to be emitted
+     as a single store.  Likewise for string concatenation,  */
   if (group->stores[0]->rhs_code == LROTATE_EXPR
-      || group->stores[0]->rhs_code == NOP_EXPR)
+      || group->stores[0]->rhs_code == NOP_EXPR
+      || group->string_concatenation)
     {
       gcc_assert (!bzero_first);
-      /* For bswap framework using sets of stores, all the checking
-	 has been done earlier in try_coalesce_bswap and needs to be
-	 emitted as a single store.  */
       if (total_orig)
 	{
 	  /* Avoid the old/new stmt count heuristics.  It should be
@@ -3658,16 +3699,12 @@  invert_op (split_store *split_store, int idx, tree int_type, tree &mask)
 bool
 imm_store_chain_info::output_merged_store (merged_store_group *group)
 {
-  split_store *split_store;
-  unsigned int i;
-  unsigned HOST_WIDE_INT start_byte_pos
+  const unsigned HOST_WIDE_INT start_byte_pos
     = group->bitregion_start / BITS_PER_UNIT;
-
   unsigned int orig_num_stmts = group->stores.length ();
   if (orig_num_stmts < 2)
     return false;
 
-  auto_vec<class split_store *, 32> split_stores;
   bool allow_unaligned_store
     = !STRICT_ALIGNMENT && param_store_merging_allow_unaligned;
   bool allow_unaligned_load = allow_unaligned_store;
@@ -3676,6 +3713,7 @@  imm_store_chain_info::output_merged_store (merged_store_group *group)
   unsigned int num_clobber_stmts = 0;
   if (group->stores[0]->rhs_code == INTEGER_CST)
     {
+      unsigned int i;
       FOR_EACH_VEC_ELT (group->stores, i, store)
 	if (gimple_clobber_p (store->stmt))
 	  num_clobber_stmts++;
@@ -3725,7 +3763,10 @@  imm_store_chain_info::output_merged_store (merged_store_group *group)
       if ((pass_min & 2) == 0)
 	bzero_first = false;
     }
-  unsigned total_orig, total_new;
+
+  auto_vec<class split_store *, 32> split_stores;
+  split_store *split_store;
+  unsigned total_orig, total_new, i;
   split_group (group, allow_unaligned_store, allow_unaligned_load, bzero_first,
 	       &split_stores, &total_orig, &total_new);
 
@@ -3940,12 +3981,14 @@  imm_store_chain_info::output_merged_store (merged_store_group *group)
 
   FOR_EACH_VEC_ELT (split_stores, i, split_store)
     {
-      unsigned HOST_WIDE_INT try_size = split_store->size;
-      unsigned HOST_WIDE_INT try_pos = split_store->bytepos;
-      unsigned HOST_WIDE_INT try_bitpos = try_pos * BITS_PER_UNIT;
-      unsigned HOST_WIDE_INT align = split_store->align;
+      const unsigned HOST_WIDE_INT try_size = split_store->size;
+      const unsigned HOST_WIDE_INT try_pos = split_store->bytepos;
+      const unsigned HOST_WIDE_INT try_bitpos = try_pos * BITS_PER_UNIT;
+      const unsigned HOST_WIDE_INT try_align = split_store->align;
+      const unsigned HOST_WIDE_INT try_offset = try_pos - start_byte_pos;
       tree dest, src;
       location_t loc;
+
       if (split_store->orig)
 	{
 	  /* If there is just a single non-clobber constituent store
@@ -3972,12 +4015,20 @@  imm_store_chain_info::output_merged_store (merged_store_group *group)
 	    orig_stmts.safe_push (info->stmt);
 	  tree offset_type
 	    = get_alias_type_for_stmts (orig_stmts, false, &clique, &base);
+	  tree dest_type;
 	  loc = get_location_for_stmts (orig_stmts);
 	  orig_stmts.truncate (0);
 
-	  tree int_type = build_nonstandard_integer_type (try_size, UNSIGNED);
-	  int_type = build_aligned_type (int_type, align);
-	  dest = fold_build2 (MEM_REF, int_type, addr,
+	  if (group->string_concatenation)
+	    dest_type
+	      = build_array_type_nelts (char_type_node,
+					try_size / BITS_PER_UNIT);
+	  else
+	    {
+	      dest_type = build_nonstandard_integer_type (try_size, UNSIGNED);
+	      dest_type = build_aligned_type (dest_type, try_align);
+	    }
+	  dest = fold_build2 (MEM_REF, dest_type, addr,
 			      build_int_cst (offset_type, try_pos));
 	  if (TREE_CODE (dest) == MEM_REF)
 	    {
@@ -3986,12 +4037,11 @@  imm_store_chain_info::output_merged_store (merged_store_group *group)
 	    }
 
 	  tree mask;
-	  if (bswap_res)
+	  if (bswap_res || group->string_concatenation)
 	    mask = integer_zero_node;
 	  else
-	    mask = native_interpret_expr (int_type,
-					  group->mask + try_pos
-					  - start_byte_pos,
+	    mask = native_interpret_expr (dest_type,
+					  group->mask + try_offset,
 					  group->buf_size);
 
 	  tree ops[2];
@@ -4002,6 +4052,12 @@  imm_store_chain_info::output_merged_store (merged_store_group *group)
 	      store_operand_info &op = split_store->orig_stores[0]->ops[j];
 	      if (bswap_res)
 		ops[j] = bswap_res;
+	      else if (group->string_concatenation)
+		{
+		  ops[j] = build_string (try_size / BITS_PER_UNIT,
+					 (const char *) group->val + try_offset);
+		  TREE_TYPE (ops[j]) = dest_type;
+		}
 	      else if (op.base_addr)
 		{
 		  FOR_EACH_VEC_ELT (split_store->orig_stores, k, info)
@@ -4043,8 +4099,7 @@  imm_store_chain_info::output_merged_store (merged_store_group *group)
 		       warnings in that case.  */
 		    TREE_NO_WARNING (ops[j]) = 1;
 
-		  stmt = gimple_build_assign (make_ssa_name (int_type),
-					      ops[j]);
+		  stmt = gimple_build_assign (make_ssa_name (dest_type), ops[j]);
 		  gimple_set_location (stmt, load_loc);
 		  if (gsi_bb (load_gsi[j]))
 		    {
@@ -4059,10 +4114,10 @@  imm_store_chain_info::output_merged_store (merged_store_group *group)
 		  ops[j] = gimple_assign_lhs (stmt);
 		  tree xor_mask;
 		  enum tree_code inv_op
-		    = invert_op (split_store, j, int_type, xor_mask);
+		    = invert_op (split_store, j, dest_type, xor_mask);
 		  if (inv_op != NOP_EXPR)
 		    {
-		      stmt = gimple_build_assign (make_ssa_name (int_type),
+		      stmt = gimple_build_assign (make_ssa_name (dest_type),
 						  inv_op, ops[j], xor_mask);
 		      gimple_set_location (stmt, load_loc);
 		      ops[j] = gimple_assign_lhs (stmt);
@@ -4075,9 +4130,8 @@  imm_store_chain_info::output_merged_store (merged_store_group *group)
 		    }
 		}
 	      else
-		ops[j] = native_interpret_expr (int_type,
-						group->val + try_pos
-						- start_byte_pos,
+		ops[j] = native_interpret_expr (dest_type,
+						group->val + try_offset,
 						group->buf_size);
 	    }
 
@@ -4096,7 +4150,7 @@  imm_store_chain_info::output_merged_store (merged_store_group *group)
 	      orig_stmts.truncate (0);
 
 	      stmt
-		= gimple_build_assign (make_ssa_name (int_type),
+		= gimple_build_assign (make_ssa_name (dest_type),
 				       split_store->orig_stores[0]->rhs_code,
 				       ops[0], ops[1]);
 	      gimple_set_location (stmt, bit_loc);
@@ -4115,10 +4169,10 @@  imm_store_chain_info::output_merged_store (merged_store_group *group)
 	      src = gimple_assign_lhs (stmt);
 	      tree xor_mask;
 	      enum tree_code inv_op;
-	      inv_op = invert_op (split_store, 2, int_type, xor_mask);
+	      inv_op = invert_op (split_store, 2, dest_type, xor_mask);
 	      if (inv_op != NOP_EXPR)
 		{
-		  stmt = gimple_build_assign (make_ssa_name (int_type),
+		  stmt = gimple_build_assign (make_ssa_name (dest_type),
 					      inv_op, src, xor_mask);
 		  gimple_set_location (stmt, bit_loc);
 		  if (load_addr[1] == NULL_TREE && gsi_bb (load_gsi[0]))
@@ -4138,17 +4192,17 @@  imm_store_chain_info::output_merged_store (merged_store_group *group)
 		  gimple_seq_add_stmt_without_update (&seq, stmt);
 		  src = gimple_assign_lhs (stmt);
 		}
-	      if (!useless_type_conversion_p (int_type, TREE_TYPE (src)))
+	      if (!useless_type_conversion_p (dest_type, TREE_TYPE (src)))
 		{
-		  stmt = gimple_build_assign (make_ssa_name (int_type),
+		  stmt = gimple_build_assign (make_ssa_name (dest_type),
 					      NOP_EXPR, src);
 		  gimple_seq_add_stmt_without_update (&seq, stmt);
 		  src = gimple_assign_lhs (stmt);
 		}
-	      inv_op = invert_op (split_store, 2, int_type, xor_mask);
+	      inv_op = invert_op (split_store, 2, dest_type, xor_mask);
 	      if (inv_op != NOP_EXPR)
 		{
-		  stmt = gimple_build_assign (make_ssa_name (int_type),
+		  stmt = gimple_build_assign (make_ssa_name (dest_type),
 					      inv_op, src, xor_mask);
 		  gimple_set_location (stmt, loc);
 		  gimple_seq_add_stmt_without_update (&seq, stmt);
@@ -4206,18 +4260,18 @@  imm_store_chain_info::output_merged_store (merged_store_group *group)
 		    tem = gimple_build (&seq, loc,
 					RSHIFT_EXPR, TREE_TYPE (tem), tem,
 					build_int_cst (NULL_TREE, -shift));
-		  tem = gimple_convert (&seq, loc, int_type, tem);
+		  tem = gimple_convert (&seq, loc, dest_type, tem);
 		  if (shift > 0)
 		    tem = gimple_build (&seq, loc,
-					LSHIFT_EXPR, int_type, tem,
+					LSHIFT_EXPR, dest_type, tem,
 					build_int_cst (NULL_TREE, shift));
 		  src = gimple_build (&seq, loc,
-				      BIT_IOR_EXPR, int_type, tem, src);
+				      BIT_IOR_EXPR, dest_type, tem, src);
 		}
 
 	  if (!integer_zerop (mask))
 	    {
-	      tree tem = make_ssa_name (int_type);
+	      tree tem = make_ssa_name (dest_type);
 	      tree load_src = unshare_expr (dest);
 	      /* The load might load some or all bits uninitialized,
 		 avoid -W*uninitialized warnings in that case.
@@ -4233,28 +4287,28 @@  imm_store_chain_info::output_merged_store (merged_store_group *group)
 
 	      /* FIXME: If there is a single chunk of zero bits in mask,
 		 perhaps use BIT_INSERT_EXPR instead?  */
-	      stmt = gimple_build_assign (make_ssa_name (int_type),
+	      stmt = gimple_build_assign (make_ssa_name (dest_type),
 					  BIT_AND_EXPR, tem, mask);
 	      gimple_set_location (stmt, loc);
 	      gimple_seq_add_stmt_without_update (&seq, stmt);
 	      tem = gimple_assign_lhs (stmt);
 
 	      if (TREE_CODE (src) == INTEGER_CST)
-		src = wide_int_to_tree (int_type,
+		src = wide_int_to_tree (dest_type,
 					wi::bit_and_not (wi::to_wide (src),
 							 wi::to_wide (mask)));
 	      else
 		{
 		  tree nmask
-		    = wide_int_to_tree (int_type,
+		    = wide_int_to_tree (dest_type,
 					wi::bit_not (wi::to_wide (mask)));
-		  stmt = gimple_build_assign (make_ssa_name (int_type),
+		  stmt = gimple_build_assign (make_ssa_name (dest_type),
 					      BIT_AND_EXPR, src, nmask);
 		  gimple_set_location (stmt, loc);
 		  gimple_seq_add_stmt_without_update (&seq, stmt);
 		  src = gimple_assign_lhs (stmt);
 		}
-	      stmt = gimple_build_assign (make_ssa_name (int_type),
+	      stmt = gimple_build_assign (make_ssa_name (dest_type),
 					  BIT_IOR_EXPR, tem, src);
 	      gimple_set_location (stmt, loc);
 	      gimple_seq_add_stmt_without_update (&seq, stmt);
@@ -4451,6 +4505,7 @@  lhs_valid_for_store_merging_p (tree lhs)
     case BIT_FIELD_REF:
     case COMPONENT_REF:
     case MEM_REF:
+    case VIEW_CONVERT_EXPR:
       return true;
     default:
       return false;
@@ -4701,14 +4756,17 @@  pass_store_merging::process_store (gimple *stmt)
   store_operand_info ops[2];
   if (invalid)
     ;
+  else if (TREE_CODE (rhs) == STRING_CST)
+    {
+      rhs_code = STRING_CST;
+      ops[0].val = rhs;
+    }
   else if (rhs_valid_for_store_merging_p (rhs))
     {
       rhs_code = INTEGER_CST;
       ops[0].val = rhs;
     }
-  else if (TREE_CODE (rhs) != SSA_NAME)
-    invalid = true;
-  else
+  else if (TREE_CODE (rhs) == SSA_NAME)
     {
       gimple *def_stmt = SSA_NAME_DEF_STMT (rhs), *def_stmt1, *def_stmt2;
       if (!is_gimple_assign (def_stmt))
@@ -4820,6 +4878,8 @@  pass_store_merging::process_store (gimple *stmt)
 	  invalid = false;
 	}
     }
+  else
+    invalid = true;
 
   unsigned HOST_WIDE_INT const_bitsize, const_bitpos;
   unsigned HOST_WIDE_INT const_bitregion_start, const_bitregion_end;
diff --git a/gcc/gimple.h b/gcc/gimple.h
index ca7fec6247e..d64c47a7d0d 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -3472,6 +3472,13 @@  gimple_call_alloca_for_var_p (gcall *s)
   return (s->subcode & GF_CALL_ALLOCA_FOR_VAR) != 0;
 }
 
+static inline bool
+gimple_call_alloca_for_var_p (gimple *s)
+{
+  const gcall *gc = GIMPLE_CHECK2<gcall *> (s);
+  return (gc->subcode & GF_CALL_ALLOCA_FOR_VAR) != 0;
+}
+
 /* If BY_DESCRIPTOR_P is true, GIMPLE_CALL S is an indirect call for which
    pointers to nested function are descriptors instead of trampolines.  */
 
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 968fae2f1a3..09a30cf69a5 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -4329,6 +4329,7 @@  gimplify_modify_expr_to_memcpy (tree *expr_p, tree size, bool want_value,
   t = builtin_decl_implicit (BUILT_IN_MEMCPY);
 
   gs = gimple_build_call (t, 3, to_ptr, from_ptr, size);
+  gimple_call_set_alloca_for_var (gs, true);
 
   if (want_value)
     {
diff --git a/gcc/tree.h b/gcc/tree.h
index a74872f5f3e..cf546ed9491 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -925,7 +925,9 @@  extern void omp_clause_range_check_failed (const_tree, const char *, int,
 #define CALL_FROM_THUNK_P(NODE) (CALL_EXPR_CHECK (NODE)->base.protected_flag)
 
 /* In a CALL_EXPR, if the function being called is BUILT_IN_ALLOCA, means that
-   it has been built for the declaration of a variable-sized object.  */
+   it has been built for the declaration of a variable-sized object and, if the
+   function being called is BUILT_IN_MEMCPY, means that it has been built for
+   the assignment of a variable-sized object.  */
 #define CALL_ALLOCA_FOR_VAR_P(NODE) \
   (CALL_EXPR_CHECK (NODE)->base.protected_flag)