diff mbox series

Use tail calls to memcpy/memset even for structure assignments (PR target/41455, PR target/82935)

Message ID 20171206210447.GJ2353@tucnak
State New
Headers show
Series Use tail calls to memcpy/memset even for structure assignments (PR target/41455, PR target/82935) | expand

Commit Message

Jakub Jelinek Dec. 6, 2017, 9:04 p.m. UTC
Hi!

Aggregate assignments and clears aren't in GIMPLE represented as calls,
and while often they expand inline, sometimes we emit libcalls for them.
This patch allows us to tail call those libcalls if there is nothing
after them.  The patch changes the tailcall pass, so that it recognizes
a = b; and c = {}; statements under certain conditions as potential tail
calls returning void, and if it finds good tail call candidates, it marks
them specially.  Because we have only a single bit left for GIMPLE_ASSIGN,
I've decided to wrap the rhs1 into a new internal call, so
a = b; will be transformed into a = TAILCALL_ASSIGN (b); and
c = {}; will be transformed into c = TAILCALL_ASSIGN ();
The rest of the patch is about propagating the flag (may use tailcall if
the emit_block_move or clear_storage is the last thing emitted) down
through expand_assignment and functions it calls.

Those functions use 1-3 other flags, so instead of adding another bool
to all of them (next to nontemporal, call_param_p, reverse) I've decided
to pass around a bitmask of flags.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-12-06  Jakub Jelinek  <jakub@redhat.com>

	PR target/41455
	PR target/82935
	* internal-fn.def (TAILCALL_ASSIGN): New internal function.
	* internal-fn.c (expand_LAUNDER): Pass EXPAND_FLAG_NORMAL to
	expand_assignment.
	(expand_TAILCALL_ASSIGN): New function.
	* tree-tailcall.c (struct tailcall): Adjust comment.
	(find_tail_calls): Recognize also aggregate assignments and
	aggregate clearing as possible tail calls.  Use is_gimple_assign
	instead of gimple_code check.
	(optimize_tail_call): Rewrite aggregate assignments or aggregate
	clearing in tail call positions using IFN_TAILCALL_ASSIGN
	internal function.
	* tree-outof-ssa.c (insert_value_copy_on_edge): Adjust store_expr
	caller.
	* tree-chkp.c (chkp_expand_bounds_reset_for_mem): Adjust
	expand_assignment caller.
	* function.c (assign_parm_setup_reg): Likewise.
	* ubsan.c (ubsan_encode_value): Likewise.
	* cfgexpand.c (expand_call_stmt, expand_asm_stmt): Likewise.
	(expand_gimple_stmt_1): Likewise.  Fix up formatting.
	* calls.c (initialize_argument_information): Adjust store_expr caller.
	* expr.h (enum expand_flag): New.
	(expand_assignment): Replace bool argument with enum expand_flag.
	(store_expr_with_bounds, store_expr): Replace int, bool, bool arguments
	with enum expand_flag.
	* expr.c (expand_assignment): Replace nontemporal argument with flags.
	Assert no bits other than EXPAND_FLAG_NONTEMPORAL and
	EXPAND_FLAG_TAILCALL are set.  Adjust store_expr, store_fields and
	store_expr_with_bounds callers.
	(store_expr_with_bounds): Replace call_param_p, nontemporal and
	reverse args with flags argument.  Adjust recursive calls.  Pass
	BLOCK_OP_TAILCALL to clear_storage and expand_block_move if
	EXPAND_FLAG_TAILCALL is set.  Call clear_storage directly for
	EXPAND_FLAG_TAILCALL assignments from emtpy CONSTRUCTOR.
	(store_expr): Replace call_param_p, nontemporal and reverse args
	with flags argument.  Adjust store_expr_with_bounds caller.
	(store_constructor_field): Adjust store_field caller.
	(store_constructor): Adjust store_expr and expand_assignment callers.
	(store_field): Replace nontemporal and reverse arguments with flags
	argument.  Adjust store_expr callers.  Pass BLOCK_OP_TAILCALL to
	emit_block_move if EXPAND_FLAG_TAILCALL is set.
	(expand_expr_real_2): Adjust store_expr and store_field callers.
	(expand_expr_real_1): Adjust store_expr and expand_assignment callers.

	* gcc.target/i386/pr41455.c: New test.


	Jakub

Comments

Richard Biener Dec. 15, 2017, 9:30 a.m. UTC | #1
On Wed, 6 Dec 2017, Jakub Jelinek wrote:

> Hi!
> 
> Aggregate assignments and clears aren't in GIMPLE represented as calls,
> and while often they expand inline, sometimes we emit libcalls for them.
> This patch allows us to tail call those libcalls if there is nothing
> after them.  The patch changes the tailcall pass, so that it recognizes
> a = b; and c = {}; statements under certain conditions as potential tail
> calls returning void, and if it finds good tail call candidates, it marks
> them specially.  Because we have only a single bit left for GIMPLE_ASSIGN,
> I've decided to wrap the rhs1 into a new internal call, so
> a = b; will be transformed into a = TAILCALL_ASSIGN (b); and
> c = {}; will be transformed into c = TAILCALL_ASSIGN ();
> The rest of the patch is about propagating the flag (may use tailcall if
> the emit_block_move or clear_storage is the last thing emitted) down
> through expand_assignment and functions it calls.
> 
> Those functions use 1-3 other flags, so instead of adding another bool
> to all of them (next to nontemporal, call_param_p, reverse) I've decided
> to pass around a bitmask of flags.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Hum, it doesn't look pretty ;)  Can we defer this to stage1 given
it's a long-standing issue and we have quite big changes going in still?

Thanks,
Richard.

> 2017-12-06  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR target/41455
> 	PR target/82935
> 	* internal-fn.def (TAILCALL_ASSIGN): New internal function.
> 	* internal-fn.c (expand_LAUNDER): Pass EXPAND_FLAG_NORMAL to
> 	expand_assignment.
> 	(expand_TAILCALL_ASSIGN): New function.
> 	* tree-tailcall.c (struct tailcall): Adjust comment.
> 	(find_tail_calls): Recognize also aggregate assignments and
> 	aggregate clearing as possible tail calls.  Use is_gimple_assign
> 	instead of gimple_code check.
> 	(optimize_tail_call): Rewrite aggregate assignments or aggregate
> 	clearing in tail call positions using IFN_TAILCALL_ASSIGN
> 	internal function.
> 	* tree-outof-ssa.c (insert_value_copy_on_edge): Adjust store_expr
> 	caller.
> 	* tree-chkp.c (chkp_expand_bounds_reset_for_mem): Adjust
> 	expand_assignment caller.
> 	* function.c (assign_parm_setup_reg): Likewise.
> 	* ubsan.c (ubsan_encode_value): Likewise.
> 	* cfgexpand.c (expand_call_stmt, expand_asm_stmt): Likewise.
> 	(expand_gimple_stmt_1): Likewise.  Fix up formatting.
> 	* calls.c (initialize_argument_information): Adjust store_expr caller.
> 	* expr.h (enum expand_flag): New.
> 	(expand_assignment): Replace bool argument with enum expand_flag.
> 	(store_expr_with_bounds, store_expr): Replace int, bool, bool arguments
> 	with enum expand_flag.
> 	* expr.c (expand_assignment): Replace nontemporal argument with flags.
> 	Assert no bits other than EXPAND_FLAG_NONTEMPORAL and
> 	EXPAND_FLAG_TAILCALL are set.  Adjust store_expr, store_fields and
> 	store_expr_with_bounds callers.
> 	(store_expr_with_bounds): Replace call_param_p, nontemporal and
> 	reverse args with flags argument.  Adjust recursive calls.  Pass
> 	BLOCK_OP_TAILCALL to clear_storage and expand_block_move if
> 	EXPAND_FLAG_TAILCALL is set.  Call clear_storage directly for
> 	EXPAND_FLAG_TAILCALL assignments from emtpy CONSTRUCTOR.
> 	(store_expr): Replace call_param_p, nontemporal and reverse args
> 	with flags argument.  Adjust store_expr_with_bounds caller.
> 	(store_constructor_field): Adjust store_field caller.
> 	(store_constructor): Adjust store_expr and expand_assignment callers.
> 	(store_field): Replace nontemporal and reverse arguments with flags
> 	argument.  Adjust store_expr callers.  Pass BLOCK_OP_TAILCALL to
> 	emit_block_move if EXPAND_FLAG_TAILCALL is set.
> 	(expand_expr_real_2): Adjust store_expr and store_field callers.
> 	(expand_expr_real_1): Adjust store_expr and expand_assignment callers.
> 
> 	* gcc.target/i386/pr41455.c: New test.
> 
> --- gcc/internal-fn.def.jj	2017-12-06 09:02:30.072952012 +0100
> +++ gcc/internal-fn.def	2017-12-06 16:56:20.958518104 +0100
> @@ -254,6 +254,11 @@ DEF_INTERNAL_FN (LAUNDER, ECF_LEAF | ECF
>  /* Divmod function.  */
>  DEF_INTERNAL_FN (DIVMOD, ECF_CONST | ECF_LEAF, NULL)
>  
> +/* Special markup for aggregate copy or clear that can be implemented
> +   using a tailcall.  lhs = rhs1; is represented by
> +   lhs = TAILCALL_ASSIGN (rhs1); and lhs = {}; by lhs = TAILCALL_ASSIGN ();  */
> +DEF_INTERNAL_FN (TAILCALL_ASSIGN, ECF_NOTHROW | ECF_LEAF, NULL)
> +
>  #undef DEF_INTERNAL_INT_FN
>  #undef DEF_INTERNAL_FLT_FN
>  #undef DEF_INTERNAL_FLT_FLOATN_FN
> --- gcc/internal-fn.c.jj	2017-12-06 09:02:29.968953307 +0100
> +++ gcc/internal-fn.c	2017-12-06 18:00:15.993826828 +0100
> @@ -2672,7 +2672,7 @@ expand_LAUNDER (internal_fn, gcall *call
>    if (!lhs)
>      return;
>  
> -  expand_assignment (lhs, gimple_call_arg (call, 0), false);
> +  expand_assignment (lhs, gimple_call_arg (call, 0), EXPAND_FLAG_NORMAL);
>  }
>  
>  /* Expand DIVMOD() using:
> @@ -2722,6 +2722,23 @@ expand_DIVMOD (internal_fn, gcall *call_
>  	       target, VOIDmode, EXPAND_NORMAL);
>  }
>  
> +/* Expand TAILCALL_ASSIGN.  */
> +
> +static void
> +expand_TAILCALL_ASSIGN (internal_fn, gcall *call_stmt)
> +{
> +  tree lhs = gimple_call_lhs (call_stmt);
> +  tree rhs;
> +  if (gimple_call_num_args (call_stmt) == 0)
> +    {
> +      rhs = build_constructor (TREE_TYPE (lhs), NULL);
> +      TREE_STATIC (rhs) = 1;
> +    }
> +  else
> +    rhs = gimple_call_arg (call_stmt, 0);
> +  expand_assignment (lhs, rhs, EXPAND_FLAG_TAILCALL);
> +}
> +
>  /* Expand a call to FN using the operands in STMT.  FN has a single
>     output operand and NARGS input operands.  */
>  
> --- gcc/tree-tailcall.c.jj	2017-12-06 09:02:30.031952522 +0100
> +++ gcc/tree-tailcall.c	2017-12-06 17:59:37.166299929 +0100
> @@ -106,7 +106,8 @@ along with GCC; see the file COPYING3.
>  
>  struct tailcall
>  {
> -  /* The iterator pointing to the call statement.  */
> +  /* The iterator pointing to the call (or aggregate copy that might be
> +     expanded as call) statement.  */
>    gimple_stmt_iterator call_gsi;
>  
>    /* True if it is a call to the current function.  */
> @@ -398,8 +399,7 @@ static void
>  find_tail_calls (basic_block bb, struct tailcall **ret)
>  {
>    tree ass_var = NULL_TREE, ret_var, func, param;
> -  gimple *stmt;
> -  gcall *call = NULL;
> +  gimple *stmt, *call = NULL;
>    gimple_stmt_iterator gsi, agsi;
>    bool tail_recursion;
>    struct tailcall *nw;
> @@ -428,7 +428,7 @@ find_tail_calls (basic_block bb, struct
>        /* Check for a call.  */
>        if (is_gimple_call (stmt))
>  	{
> -	  call = as_a <gcall *> (stmt);
> +	  call = stmt;
>  	  ass_var = gimple_call_lhs (call);
>  	  break;
>  	}
> @@ -440,6 +440,38 @@ find_tail_calls (basic_block bb, struct
>  	  && auto_var_in_fn_p (gimple_assign_rhs1 (stmt), cfun->decl))
>  	continue;
>  
> +      /* In addition to calls, allow aggregate copies that could be expanded
> +	 as memcpy or memset.  Pretend it has NULL lhs then.  */
> +      if (gimple_references_memory_p (stmt)
> +	  && gimple_assign_single_p (stmt)
> +	  && !gimple_has_volatile_ops (stmt)
> +	  && !gimple_assign_nontemporal_move_p (as_a <gassign *> (stmt))
> +	  && gimple_vdef (stmt)
> +	  && !is_gimple_reg_type (TREE_TYPE (gimple_assign_lhs (stmt))))
> +	{
> +	  tree lhs = gimple_assign_lhs (stmt);
> +	  if (TYPE_MODE (TREE_TYPE (lhs)) != BLKmode)
> +	    return;
> +	  tree rhs1 = gimple_assign_rhs1 (stmt);
> +	  if (auto_var_in_fn_p (get_base_address (lhs), cfun->decl))
> +	    return;
> +	  if (TREE_CODE (rhs1) == CONSTRUCTOR)
> +	    {
> +	      if (CONSTRUCTOR_NELTS (rhs1) != 0 || !TREE_STATIC (rhs1))
> +		return;
> +	    }
> +	  else if (auto_var_in_fn_p (get_base_address (rhs1), cfun->decl))
> +	    return;
> +	  if (reverse_storage_order_for_component_p (lhs)
> +	      || reverse_storage_order_for_component_p (rhs1))
> +	    return;
> +	  if (operand_equal_p (lhs, rhs1, 0))
> +	    return;
> +	  call = stmt;
> +	  ass_var = NULL_TREE;
> +	  break;
> +	}
> +
>        /* If the statement references memory or volatile operands, fail.  */
>        if (gimple_references_memory_p (stmt)
>  	  || gimple_has_volatile_ops (stmt))
> @@ -474,7 +506,7 @@ find_tail_calls (basic_block bb, struct
>  
>    /* We found the call, check whether it is suitable.  */
>    tail_recursion = false;
> -  func = gimple_call_fndecl (call);
> +  func = is_gimple_call (call) ? gimple_call_fndecl (call) : NULL_TREE;
>    if (func
>        && !DECL_BUILT_IN (func)
>        && recursive_call_p (current_function_decl, func))
> @@ -521,7 +553,9 @@ find_tail_calls (basic_block bb, struct
>  	  && auto_var_in_fn_p (var, cfun->decl)
>  	  && may_be_aliased (var)
>  	  && (ref_maybe_used_by_stmt_p (call, var)
> -	      || call_may_clobber_ref_p (call, var)))
> +	      || (is_gimple_call (call)
> +		  ? call_may_clobber_ref_p (as_a <gcall *> (call), var)
> +		  : refs_output_dependent_p (gimple_assign_lhs (call), var))))
>  	return;
>      }
>  
> @@ -560,7 +594,7 @@ find_tail_calls (basic_block bb, struct
>  	  || is_gimple_debug (stmt))
>  	continue;
>  
> -      if (gimple_code (stmt) != GIMPLE_ASSIGN)
> +      if (!is_gimple_assign (stmt))
>  	return;
>  
>        /* This is a gimple assign. */
> @@ -956,9 +990,31 @@ optimize_tail_call (struct tailcall *t,
>  
>    if (opt_tailcalls)
>      {
> -      gcall *stmt = as_a <gcall *> (gsi_stmt (t->call_gsi));
> -
> -      gimple_call_set_tail (stmt, true);
> +      gimple *stmt = gsi_stmt (t->call_gsi);
> +      if (gcall *call = dyn_cast <gcall *> (stmt))
> +	gimple_call_set_tail (call, true);
> +      else
> +	{
> +	  tree lhs = gimple_assign_lhs (stmt);
> +	  tree rhs1 = gimple_assign_rhs1 (stmt);
> +	  gcall *g;
> +	  if (TREE_CODE (rhs1) == CONSTRUCTOR)
> +	    g = gimple_build_call_internal (IFN_TAILCALL_ASSIGN, 0);
> +	  else
> +	    g = gimple_build_call_internal (IFN_TAILCALL_ASSIGN, 1,
> +					    rhs1);
> +	  gimple_call_set_lhs (g, lhs);
> +	  gimple_set_location (g, gimple_location (stmt));
> +	  if (gimple_vdef (stmt)
> +	      && TREE_CODE (gimple_vdef (stmt)) == SSA_NAME)
> +	    {
> +	      gimple_set_vdef (g, gimple_vdef (stmt));
> +	      SSA_NAME_DEF_STMT (gimple_vdef (g)) = g;
> +	    }
> +	  if (gimple_vuse (stmt))
> +	    gimple_set_vuse (g, gimple_vuse (stmt));
> +	  gsi_replace (&t->call_gsi, g, false);
> +	}
>        cfun->tail_call_marked = true;
>        if (dump_file && (dump_flags & TDF_DETAILS))
>          {
> --- gcc/tree-outof-ssa.c.jj	2017-10-09 09:41:21.000000000 +0200
> +++ gcc/tree-outof-ssa.c	2017-12-06 17:17:16.468209052 +0100
> @@ -311,7 +311,7 @@ insert_value_copy_on_edge (edge e, int d
>    else if (src_mode == BLKmode)
>      {
>        x = dest_rtx;
> -      store_expr (src, x, 0, false, false);
> +      store_expr (src, x, EXPAND_FLAG_NORMAL);
>      }
>    else
>      x = expand_expr (src, dest_rtx, dest_mode, EXPAND_NORMAL);
> --- gcc/tree-chkp.c.jj	2017-12-06 09:02:30.141951153 +0100
> +++ gcc/tree-chkp.c	2017-12-06 16:56:20.956518128 +0100
> @@ -481,7 +481,7 @@ chkp_expand_bounds_reset_for_mem (tree m
>  		 build_pointer_type (TREE_TYPE (mem)), mem);
>    bndstx = chkp_build_bndstx_call (addr, ptr, bnd);
>  
> -  expand_assignment (bnd, zero_bnd, false);
> +  expand_assignment (bnd, zero_bnd, EXPAND_FLAG_NORMAL);
>    expand_normal (bndstx);
>  }
>  
> --- gcc/function.c.jj	2017-12-06 09:02:29.991953021 +0100
> +++ gcc/function.c	2017-12-06 16:56:20.958518104 +0100
> @@ -3284,7 +3284,8 @@ assign_parm_setup_reg (struct assign_par
>        /* TREE_USED gets set erroneously during expand_assignment.  */
>        save_tree_used = TREE_USED (parm);
>        SET_DECL_RTL (parm, rtl);
> -      expand_assignment (parm, make_tree (data->nominal_type, tempreg), false);
> +      expand_assignment (parm, make_tree (data->nominal_type, tempreg),
> +			 EXPAND_FLAG_NORMAL);
>        SET_DECL_RTL (parm, NULL_RTX);
>        TREE_USED (parm) = save_tree_used;
>        all->first_conversion_insn = get_insns ();
> --- gcc/ubsan.c.jj	2017-12-06 09:02:29.951953519 +0100
> +++ gcc/ubsan.c	2017-12-06 16:56:20.960518080 +0100
> @@ -165,7 +165,7 @@ ubsan_encode_value (tree t, enum ubsan_e
>  	      rtx mem = assign_stack_temp_for_type (mode, GET_MODE_SIZE (mode),
>  						    type);
>  	      SET_DECL_RTL (var, mem);
> -	      expand_assignment (var, t, false);
> +	      expand_assignment (var, t, EXPAND_FLAG_NORMAL);
>  	      return build_fold_addr_expr (var);
>  	    }
>  	  if (phase != UBSAN_ENCODE_VALUE_GENERIC)
> --- gcc/cfgexpand.c.jj	2017-12-06 09:02:30.056952211 +0100
> +++ gcc/cfgexpand.c	2017-12-06 16:56:20.959518092 +0100
> @@ -2668,7 +2668,7 @@ expand_call_stmt (gcall *stmt)
>    rtx_insn *before_call = get_last_insn ();
>    lhs = gimple_call_lhs (stmt);
>    if (lhs)
> -    expand_assignment (lhs, exp, false);
> +    expand_assignment (lhs, exp, EXPAND_FLAG_NORMAL);
>    else
>      expand_expr (exp, const0_rtx, VOIDmode, EXPAND_NORMAL);
>  
> @@ -3071,7 +3071,7 @@ expand_asm_stmt (gasm *stmt)
>  	  generating_concat_p = old_generating_concat_p;
>  
>  	  push_to_sequence2 (after_rtl_seq, after_rtl_end);
> -	  expand_assignment (val, make_tree (type, op), false);
> +	  expand_assignment (val, make_tree (type, op), EXPAND_FLAG_NORMAL);
>  	  after_rtl_seq = get_insns ();
>  	  after_rtl_end = get_last_insn ();
>  	  end_sequence ();
> @@ -3672,9 +3672,14 @@ expand_gimple_stmt_1 (gimple *stmt)
>  		 this LHS.  */
>  	      ;
>  	    else
> -	      expand_assignment (lhs, rhs,
> -				 gimple_assign_nontemporal_move_p (
> -				   assign_stmt));
> +	      {
> +		enum expand_flag flag;
> +		if (gimple_assign_nontemporal_move_p (assign_stmt))
> +		  flag = EXPAND_FLAG_NONTEMPORAL;
> +		else
> +		  flag = EXPAND_FLAG_NORMAL;
> +		expand_assignment (lhs, rhs, flag);
> +	      }
>  	  }
>  	else
>  	  {
> --- gcc/calls.c.jj	2017-11-22 21:37:50.000000000 +0100
> +++ gcc/calls.c	2017-12-06 17:10:12.432363809 +0100
> @@ -1971,7 +1971,7 @@ initialize_argument_information (int num
>  	      else
>  		copy = assign_temp (type, 1, 0);
>  
> -	      store_expr (args[i].tree_value, copy, 0, false, false);
> +	      store_expr (args[i].tree_value, copy, EXPAND_FLAG_NORMAL);
>  
>  	      /* Just change the const function to pure and then let
>  		 the next test clear the pure based on
> --- gcc/expr.h.jj	2017-12-06 09:02:30.120951414 +0100
> +++ gcc/expr.h	2017-12-06 17:09:34.782823601 +0100
> @@ -35,6 +35,26 @@ enum expand_modifier {EXPAND_NORMAL = 0,
>  		      EXPAND_CONST_ADDRESS, EXPAND_INITIALIZER, EXPAND_WRITE,
>  		      EXPAND_MEMORY};
>  
> +/* Flags arguments for expand_assignment/store_expr*.  The argument is
> +   a bitwise or of these flags.  */
> +enum expand_flag {
> +  /* Value if none of the flags are set.  */
> +  EXPAND_FLAG_NORMAL = 0,
> +  /* Expand the assignment/store as nontemporal store if possible.  */
> +  EXPAND_FLAG_NONTEMPORAL = 1,
> +  /* If the assignment is expanded as a libcall, it can be a tail call.  */
> +  EXPAND_FLAG_TAILCALL = 2,
> +
> +  /* Flags below this point are only for store_expr*, not for
> +     expand_assignment.  */
> +
> +  /* Reverse bytes in the store.  */
> +  EXPAND_FLAG_REVERSE = 4,
> +  /* True for stores into call params on the stack, where block moves to
> +     that may need special treatment.  */
> +  EXPAND_FLAG_CALL_PARAM_P = 8
> +};
> +
>  /* Prevent the compiler from deferring stack pops.  See
>     inhibit_defer_pop for more information.  */
>  #define NO_DEFER_POP (inhibit_defer_pop += 1)
> @@ -244,14 +264,14 @@ extern void get_bit_range (unsigned HOST
>  			   tree, HOST_WIDE_INT *, tree *);
>  
>  /* Expand an assignment that stores the value of FROM into TO.  */
> -extern void expand_assignment (tree, tree, bool);
> +extern void expand_assignment (tree, tree, enum expand_flag);
>  
>  /* Generate code for computing expression EXP,
>     and storing the value into TARGET.
>     If SUGGEST_REG is nonzero, copy the value through a register
>     and return that register, if that is possible.  */
> -extern rtx store_expr_with_bounds (tree, rtx, int, bool, bool, tree);
> -extern rtx store_expr (tree, rtx, int, bool, bool);
> +extern rtx store_expr_with_bounds (tree, rtx, enum expand_flag, tree);
> +extern rtx store_expr (tree, rtx, enum expand_flag);
>  
>  /* Given an rtx that may include add and multiply operations,
>     generate them as insns and return a pseudo-reg containing the value.
> --- gcc/expr.c.jj	2017-12-06 09:02:30.103951626 +0100
> +++ gcc/expr.c	2017-12-06 17:54:46.185845429 +0100
> @@ -86,7 +86,7 @@ static void store_constructor_field (rtx
>  static void store_constructor (tree, rtx, int, HOST_WIDE_INT, bool);
>  static rtx store_field (rtx, HOST_WIDE_INT, HOST_WIDE_INT,
>  			unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT,
> -			machine_mode, tree, alias_set_type, bool, bool);
> +			machine_mode, tree, alias_set_type, enum expand_flag);
>  
>  static unsigned HOST_WIDE_INT highest_pow2_factor_for_target (const_tree, const_tree);
>  
> @@ -4915,11 +4915,12 @@ mem_ref_refers_to_non_mem_p (tree ref)
>    return addr_expr_of_non_mem_decl_p_1 (base, false);
>  }
>  
> -/* Expand an assignment that stores the value of FROM into TO.  If NONTEMPORAL
> -   is true, try generating a nontemporal store.  */
> +/* Expand an assignment that stores the value of FROM into TO.  If
> +   flags & EXPAND_FLAG_NONTEMPORAL, try generating a nontemporal store,
> +   if flags & EXPAND_FLAG_TAILCALL, allow generating a tail call.  */
>  
>  void
> -expand_assignment (tree to, tree from, bool nontemporal)
> +expand_assignment (tree to, tree from, enum expand_flag flags)
>  {
>    rtx to_rtx = 0;
>    rtx result;
> @@ -4927,6 +4928,10 @@ expand_assignment (tree to, tree from, b
>    unsigned int align;
>    enum insn_code icode;
>  
> +  /* Rest of the flags only make sense for store_*.  */
> +  gcc_checking_assert ((flags & ~(EXPAND_FLAG_NONTEMPORAL
> +				  | EXPAND_FLAG_TAILCALL)) == 0);
> +
>    /* Don't crash if the lhs of the assignment was erroneous.  */
>    if (TREE_CODE (to) == ERROR_MARK)
>      {
> @@ -4992,6 +4997,7 @@ expand_assignment (tree to, tree from, b
>        tree offset;
>        int unsignedp, reversep, volatilep = 0;
>        tree tem;
> +      enum expand_flag flags_rev = flags;
>  
>        push_temp_slots ();
>        tem = get_inner_reference (to, &bitsize, &bitpos, &offset, &mode1,
> @@ -5004,6 +5010,8 @@ expand_assignment (tree to, tree from, b
>  	  offset = size_int (bitpos >> LOG2_BITS_PER_UNIT);
>  	  bitpos &= BITS_PER_UNIT - 1;
>  	}
> +      if (reversep)
> +	flags_rev = (enum expand_flag) (flags | EXPAND_FLAG_REVERSE);
>  
>        if (TREE_CODE (to) == COMPONENT_REF
>  	  && DECL_BIT_FIELD_TYPE (TREE_OPERAND (to, 1)))
> @@ -5114,22 +5122,21 @@ expand_assignment (tree to, tree from, b
>  	      && COMPLEX_MODE_P (GET_MODE (to_rtx))
>  	      && bitpos == 0
>  	      && bitsize == mode_bitsize)
> -	    result = store_expr (from, to_rtx, false, nontemporal, reversep);
> +	    result = store_expr (from, to_rtx, flags_rev);
>  	  else if (bitsize == mode_bitsize / 2
>  		   && (bitpos == 0 || bitpos == mode_bitsize / 2))
> -	    result = store_expr (from, XEXP (to_rtx, bitpos != 0), false,
> -				 nontemporal, reversep);
> +	    result = store_expr (from, XEXP (to_rtx, bitpos != 0), flags_rev);
>  	  else if (bitpos + bitsize <= mode_bitsize / 2)
>  	    result = store_field (XEXP (to_rtx, 0), bitsize, bitpos,
>  				  bitregion_start, bitregion_end,
>  				  mode1, from, get_alias_set (to),
> -				  nontemporal, reversep);
> +				  flags_rev);
>  	  else if (bitpos >= mode_bitsize / 2)
>  	    result = store_field (XEXP (to_rtx, 1), bitsize,
>  				  bitpos - mode_bitsize / 2,
>  				  bitregion_start, bitregion_end,
>  				  mode1, from, get_alias_set (to),
> -				  nontemporal, reversep);
> +				  flags_rev);
>  	  else if (bitpos == 0 && bitsize == mode_bitsize)
>  	    {
>  	      result = expand_normal (from);
> @@ -5166,7 +5173,8 @@ expand_assignment (tree to, tree from, b
>  	      result = store_field (temp, bitsize, bitpos,
>  				    bitregion_start, bitregion_end,
>  				    mode1, from, get_alias_set (to),
> -				    nontemporal, reversep);
> +				    (enum expand_flag)
> +				    (flags_rev & ~EXPAND_FLAG_TAILCALL));
>  	      emit_move_insn (XEXP (to_rtx, 0), read_complex_part (temp, false));
>  	      emit_move_insn (XEXP (to_rtx, 1), read_complex_part (temp, true));
>  	    }
> @@ -5192,7 +5200,7 @@ expand_assignment (tree to, tree from, b
>  	    result = store_field (to_rtx, bitsize, bitpos,
>  				  bitregion_start, bitregion_end,
>  				  mode1, from, get_alias_set (to),
> -				  nontemporal, reversep);
> +				  flags_rev);
>  	}
>  
>        if (result)
> @@ -5341,7 +5349,7 @@ expand_assignment (tree to, tree from, b
>    /* Compute FROM and store the value in the rtx we got.  */
>  
>    push_temp_slots ();
> -  result = store_expr_with_bounds (from, to_rtx, 0, nontemporal, false, to);
> +  result = store_expr_with_bounds (from, to_rtx, flags, to);
>    preserve_temp_slots (result);
>    pop_temp_slots ();
>    return;
> @@ -5375,19 +5383,14 @@ emit_storent_insn (rtx to, rtx from)
>     with no sequence point.  Will other languages need this to
>     be more thorough?
>  
> -   If CALL_PARAM_P is nonzero, this is a store into a call param on the
> -   stack, and block moves may need to be treated specially.
> -
> -   If NONTEMPORAL is true, try using a nontemporal store instruction.
> -
> -   If REVERSE is true, the store is to be done in reverse order.
> +   FLAGS is a bitwise or of EXPAND_FLAG_* defined in expr.h.
>  
>     If BTARGET is not NULL then computed bounds of EXP are
>     associated with BTARGET.  */
>  
>  rtx
> -store_expr_with_bounds (tree exp, rtx target, int call_param_p,
> -			bool nontemporal, bool reverse, tree btarget)
> +store_expr_with_bounds (tree exp, rtx target, enum expand_flag flags,
> +			tree btarget)
>  {
>    rtx temp;
>    rtx alt_rtl = NULL_RTX;
> @@ -5398,7 +5401,7 @@ store_expr_with_bounds (tree exp, rtx ta
>        /* C++ can generate ?: expressions with a throw expression in one
>  	 branch and an rvalue in the other. Here, we resolve attempts to
>  	 store the throw expression's nonexistent result.  */
> -      gcc_assert (!call_param_p);
> +      gcc_assert ((flags & EXPAND_FLAG_CALL_PARAM_P) == 0);
>        expand_expr (exp, const0_rtx, VOIDmode, EXPAND_NORMAL);
>        return NULL_RTX;
>      }
> @@ -5407,9 +5410,9 @@ store_expr_with_bounds (tree exp, rtx ta
>        /* Perform first part of compound expression, then assign from second
>  	 part.  */
>        expand_expr (TREE_OPERAND (exp, 0), const0_rtx, VOIDmode,
> -		   call_param_p ? EXPAND_STACK_PARM : EXPAND_NORMAL);
> -      return store_expr_with_bounds (TREE_OPERAND (exp, 1), target,
> -				     call_param_p, nontemporal, reverse,
> +		   (flags & EXPAND_FLAG_CALL_PARAM_P)
> +		   ? EXPAND_STACK_PARM : EXPAND_NORMAL);
> +      return store_expr_with_bounds (TREE_OPERAND (exp, 1), target, flags,
>  				     btarget);
>      }
>    else if (TREE_CODE (exp) == COND_EXPR && GET_MODE (target) == BLKmode)
> @@ -5425,13 +5428,15 @@ store_expr_with_bounds (tree exp, rtx ta
>        NO_DEFER_POP;
>        jumpifnot (TREE_OPERAND (exp, 0), lab1,
>  		 profile_probability::uninitialized ());
> -      store_expr_with_bounds (TREE_OPERAND (exp, 1), target, call_param_p,
> -			      nontemporal, reverse, btarget);
> +      store_expr_with_bounds (TREE_OPERAND (exp, 1), target,
> +			      (enum expand_flag)
> +			      (flags & ~EXPAND_FLAG_TAILCALL), btarget);
>        emit_jump_insn (targetm.gen_jump (lab2));
>        emit_barrier ();
>        emit_label (lab1);
> -      store_expr_with_bounds (TREE_OPERAND (exp, 2), target, call_param_p,
> -			      nontemporal, reverse, btarget);
> +      store_expr_with_bounds (TREE_OPERAND (exp, 2), target,
> +			      (enum expand_flag)
> +			      (flags & ~EXPAND_FLAG_TAILCALL), btarget);
>        emit_label (lab2);
>        OK_DEFER_POP;
>  
> @@ -5482,7 +5487,8 @@ store_expr_with_bounds (tree exp, rtx ta
>  	}
>  
>        temp = expand_expr (exp, inner_target, VOIDmode,
> -			  call_param_p ? EXPAND_STACK_PARM : EXPAND_NORMAL);
> +			  (flags & EXPAND_FLAG_CALL_PARAM_P)
> +			  ? EXPAND_STACK_PARM : EXPAND_NORMAL);
>  
>        /* Handle bounds returned by call.  */
>        if (TREE_CODE (exp) == CALL_EXPR)
> @@ -5518,7 +5524,8 @@ store_expr_with_bounds (tree exp, rtx ta
>  		&& TREE_CODE (TREE_OPERAND (TREE_OPERAND (exp, 0), 0))
>  		   == STRING_CST
>  		&& integer_zerop (TREE_OPERAND (exp, 1))))
> -	   && !nontemporal && !call_param_p
> +	   && (flags & (EXPAND_FLAG_NONTEMPORAL
> +			| EXPAND_FLAG_CALL_PARAM_P)) == 0
>  	   && MEM_P (target))
>      {
>        /* Optimize initialization of an array with a STRING_CST.  */
> @@ -5562,7 +5569,21 @@ store_expr_with_bounds (tree exp, rtx ta
>        if (exp_len > str_copy_len)
>  	clear_storage (adjust_address (dest_mem, BLKmode, 0),
>  		       GEN_INT (exp_len - str_copy_len),
> -		       BLOCK_OP_NORMAL);
> +		       (flags & EXPAND_FLAG_TAILCALL)
> +		       ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL);
> +      return NULL_RTX;
> +    }
> +  else if (flags == EXPAND_FLAG_TAILCALL
> +	   && TREE_CODE (exp) == CONSTRUCTOR
> +	   && TREE_CODE (TREE_TYPE (exp)) != ERROR_MARK
> +	   && TREE_STATIC (exp)
> +	   && !TREE_ADDRESSABLE (exp)
> +	   && TYPE_MODE (TREE_TYPE (exp)) == BLKmode
> +	   && MEM_P (target)
> +	   && GET_MODE (target) == BLKmode
> +	   && CONSTRUCTOR_NELTS (exp) == 0)
> +    {
> +      clear_storage (target, expr_size (exp), BLOCK_OP_TAILCALL);
>        return NULL_RTX;
>      }
>    else
> @@ -5572,9 +5593,10 @@ store_expr_with_bounds (tree exp, rtx ta
>    normal_expr:
>        /* If we want to use a nontemporal or a reverse order store, force the
>  	 value into a register first.  */
> -      tmp_target = nontemporal || reverse ? NULL_RTX : target;
> +      tmp_target = (flags & (EXPAND_FLAG_NONTEMPORAL | EXPAND_FLAG_REVERSE))
> +		   ? NULL_RTX : target;
>        temp = expand_expr_real (exp, tmp_target, GET_MODE (target),
> -			       (call_param_p
> +			       ((flags & EXPAND_FLAG_CALL_PARAM_P)
>  				? EXPAND_STACK_PARM : EXPAND_NORMAL),
>  			       &alt_rtl, false);
>  
> @@ -5647,7 +5669,8 @@ store_expr_with_bounds (tree exp, rtx ta
>  	      else
>  		store_bit_field (target,
>  				 INTVAL (expr_size (exp)) * BITS_PER_UNIT,
> -				 0, 0, 0, GET_MODE (temp), temp, reverse);
> +				 0, 0, 0, GET_MODE (temp), temp,
> +				 (flags & EXPAND_FLAG_REVERSE) != 0);
>  	    }
>  	  else
>  	    convert_move (target, temp, TYPE_UNSIGNED (TREE_TYPE (exp)));
> @@ -5664,8 +5687,10 @@ store_expr_with_bounds (tree exp, rtx ta
>  	  if (CONST_INT_P (size)
>  	      && INTVAL (size) < TREE_STRING_LENGTH (exp))
>  	    emit_block_move (target, temp, size,
> -			     (call_param_p
> -			      ? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL));
> +			     ((flags & EXPAND_FLAG_CALL_PARAM_P)
> +			      ? BLOCK_OP_CALL_PARM
> +			      : (flags & EXPAND_FLAG_TAILCALL)
> +			      ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL));
>  	  else
>  	    {
>  	      machine_mode pointer_mode
> @@ -5679,7 +5704,7 @@ store_expr_with_bounds (tree exp, rtx ta
>  				  size_int (TREE_STRING_LENGTH (exp)));
>  	      rtx copy_size_rtx
>  		= expand_expr (copy_size, NULL_RTX, VOIDmode,
> -			       (call_param_p
> +			       ((flags & EXPAND_FLAG_CALL_PARAM_P)
>  				? EXPAND_STACK_PARM : EXPAND_NORMAL));
>  	      rtx_code_label *label = 0;
>  
> @@ -5687,7 +5712,7 @@ store_expr_with_bounds (tree exp, rtx ta
>  	      copy_size_rtx = convert_to_mode (pointer_mode, copy_size_rtx,
>  					       TYPE_UNSIGNED (sizetype));
>  	      emit_block_move (target, temp, copy_size_rtx,
> -			       (call_param_p
> +			       ((flags & EXPAND_FLAG_CALL_PARAM_P)
>  				? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL));
>  
>  	      /* Figure out how much is left in TARGET that we have to clear.
> @@ -5739,14 +5764,17 @@ store_expr_with_bounds (tree exp, rtx ta
>  			  int_size_in_bytes (TREE_TYPE (exp)));
>        else if (GET_MODE (temp) == BLKmode)
>  	emit_block_move (target, temp, expr_size (exp),
> -			 (call_param_p
> -			  ? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL));
> +			 ((flags & EXPAND_FLAG_CALL_PARAM_P)
> +			  ? BLOCK_OP_CALL_PARM
> +			  : (flags & EXPAND_FLAG_TAILCALL)
> +			  ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL));
>        /* If we emit a nontemporal store, there is nothing else to do.  */
> -      else if (nontemporal && emit_storent_insn (target, temp))
> +      else if ((flags & EXPAND_FLAG_NONTEMPORAL)
> +	       && emit_storent_insn (target, temp))
>  	;
>        else
>  	{
> -	  if (reverse)
> +	  if (flags & EXPAND_FLAG_REVERSE)
>  	    temp = flip_storage_order (GET_MODE (target), temp);
>  	  temp = force_operand (temp, target);
>  	  if (temp != target)
> @@ -5759,11 +5787,9 @@ store_expr_with_bounds (tree exp, rtx ta
>  
>  /* Same as store_expr_with_bounds but ignoring bounds of EXP.  */
>  rtx
> -store_expr (tree exp, rtx target, int call_param_p, bool nontemporal,
> -	    bool reverse)
> +store_expr (tree exp, rtx target, enum expand_flag flags)
>  {
> -  return store_expr_with_bounds (exp, target, call_param_p, nontemporal,
> -				 reverse, NULL);
> +  return store_expr_with_bounds (exp, target, flags, NULL);
>  }
>  
>  /* Return true if field F of structure TYPE is a flexible array.  */
> @@ -6141,7 +6167,8 @@ store_constructor_field (rtx target, uns
>      }
>    else
>      store_field (target, bitsize, bitpos, bitregion_start, bitregion_end, mode,
> -		 exp, alias_set, false, reverse);
> +		 exp, alias_set,
> +		 reverse ? EXPAND_FLAG_REVERSE : EXPAND_FLAG_NORMAL);
>  }
>  
>  
> @@ -6338,6 +6365,8 @@ store_constructor (tree exp, rtx target,
>  
>  	/* The storage order is specified for every aggregate type.  */
>  	reverse = TYPE_REVERSE_STORAGE_ORDER (type);
> +	enum expand_flag flags_rev
> +	  = reverse ? EXPAND_FLAG_REVERSE : EXPAND_FLAG_NORMAL;
>  
>  	domain = TYPE_DOMAIN (type);
>  	const_bounds_p = (TYPE_MIN_VALUE (domain)
> @@ -6495,7 +6524,7 @@ store_constructor (tree exp, rtx target,
>  					VAR_DECL, NULL_TREE, domain);
>  		    index_r = gen_reg_rtx (promote_decl_mode (index, NULL));
>  		    SET_DECL_RTL (index, index_r);
> -		    store_expr (lo_index, index_r, 0, false, reverse);
> +		    store_expr (lo_index, index_r, flags_rev);
>  
>  		    /* Build the head of the loop.  */
>  		    do_pending_stack_adjust ();
> @@ -6522,7 +6551,7 @@ store_constructor (tree exp, rtx target,
>  		      store_constructor (value, xtarget, cleared,
>  					 bitsize / BITS_PER_UNIT, reverse);
>  		    else
> -		      store_expr (value, xtarget, 0, false, reverse);
> +		      store_expr (value, xtarget, flags_rev);
>  
>  		    /* Generate a conditional jump to exit the loop.  */
>  		    exit_cond = build2 (LT_EXPR, integer_type_node,
> @@ -6535,7 +6564,7 @@ store_constructor (tree exp, rtx target,
>  		    expand_assignment (index,
>  				       build2 (PLUS_EXPR, TREE_TYPE (index),
>  					       index, integer_one_node),
> -				       false);
> +				       EXPAND_FLAG_NORMAL);
>  
>  		    emit_jump (loop_start);
>  
> @@ -6566,7 +6595,7 @@ store_constructor (tree exp, rtx target,
>  					  expand_normal (position),
>  					  highest_pow2_factor (position));
>  		xtarget = adjust_address (xtarget, mode, 0);
> -		store_expr (value, xtarget, 0, false, reverse);
> +		store_expr (value, xtarget, flags_rev);
>  	      }
>  	    else
>  	      {
> @@ -6760,16 +6789,14 @@ store_constructor (tree exp, rtx target,
>     (in general) be different from that for TARGET, since TARGET is a
>     reference to the containing structure.
>  
> -   If NONTEMPORAL is true, try generating a nontemporal store.
> -
> -   If REVERSE is true, the store is to be done in reverse order.  */
> +   FLAGS is a bitmask of EXPAND_FLAG_* flags defined in expr.h.  */
>  
>  static rtx
>  store_field (rtx target, HOST_WIDE_INT bitsize, HOST_WIDE_INT bitpos,
>  	     unsigned HOST_WIDE_INT bitregion_start,
>  	     unsigned HOST_WIDE_INT bitregion_end,
>  	     machine_mode mode, tree exp,
> -	     alias_set_type alias_set, bool nontemporal,  bool reverse)
> +	     alias_set_type alias_set, enum expand_flag flags)
>  {
>    if (TREE_CODE (exp) == ERROR_MARK)
>      return const0_rtx;
> @@ -6787,7 +6814,7 @@ store_field (rtx target, HOST_WIDE_INT b
>        /* We're storing into a struct containing a single __complex.  */
>  
>        gcc_assert (!bitpos);
> -      return store_expr (exp, target, 0, nontemporal, reverse);
> +      return store_expr (exp, target, flags);
>      }
>  
>    /* If the structure is in a register or if the component
> @@ -6903,11 +6930,16 @@ store_field (rtx target, HOST_WIDE_INT b
>  	{
>  	  HOST_WIDE_INT size = GET_MODE_BITSIZE (temp_mode);
>  
> -	  reverse = TYPE_REVERSE_STORAGE_ORDER (TREE_TYPE (exp));
> +	  bool reverse = TYPE_REVERSE_STORAGE_ORDER (TREE_TYPE (exp));
>  
>  	  if (reverse)
>  	    temp = flip_storage_order (temp_mode, temp);
>  
> +	  if (reverse)
> +	    flags = (enum expand_flag) (flags | EXPAND_FLAG_REVERSE);
> +	  else
> +	    flags = (enum expand_flag) (flags & ~EXPAND_FLAG_REVERSE);
> +
>  	  if (bitsize < size
>  	      && reverse ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN
>  	      && !(mode == BLKmode && bitsize > BITS_PER_WORD))
> @@ -6937,7 +6969,8 @@ store_field (rtx target, HOST_WIDE_INT b
>  	  emit_block_move (target, temp,
>  			   GEN_INT ((bitsize + BITS_PER_UNIT - 1)
>  				    / BITS_PER_UNIT),
> -			   BLOCK_OP_NORMAL);
> +			   (flags & EXPAND_FLAG_TAILCALL)
> +			   ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL);
>  
>  	  return const0_rtx;
>  	}
> @@ -6954,7 +6987,7 @@ store_field (rtx target, HOST_WIDE_INT b
>        /* Store the value in the bitfield.  */
>        store_bit_field (target, bitsize, bitpos,
>  		       bitregion_start, bitregion_end,
> -		       mode, temp, reverse);
> +		       mode, temp, (flags & EXPAND_FLAG_REVERSE) != 0);
>  
>        return const0_rtx;
>      }
> @@ -6974,11 +7007,12 @@ store_field (rtx target, HOST_WIDE_INT b
>        if (TREE_CODE (exp) == CONSTRUCTOR && bitsize >= 0)
>  	{
>  	  gcc_assert (bitsize % BITS_PER_UNIT == 0);
> -	  store_constructor (exp, to_rtx, 0, bitsize / BITS_PER_UNIT, reverse);
> +	  store_constructor (exp, to_rtx, 0, bitsize / BITS_PER_UNIT,
> +			     (flags & EXPAND_FLAG_REVERSE) != 0);
>  	  return to_rtx;
>  	}
>  
> -      return store_expr (exp, to_rtx, 0, nontemporal, reverse);
> +      return store_expr (exp, to_rtx, flags);
>      }
>  }
>  
> @@ -8322,8 +8356,11 @@ expand_expr_real_2 (sepops ops, rtx targ
>  	    /* Store data into beginning of memory target.  */
>  	    store_expr (treeop0,
>  			adjust_address (target, TYPE_MODE (valtype), 0),
> -			modifier == EXPAND_STACK_PARM,
> -			false, TYPE_REVERSE_STORAGE_ORDER (type));
> +			(enum expand_flag)
> +			((modifier == EXPAND_STACK_PARM
> +			 ? EXPAND_FLAG_CALL_PARAM_P : EXPAND_FLAG_NORMAL)
> +			 | (TYPE_REVERSE_STORAGE_ORDER (type)
> +			    ? EXPAND_FLAG_REVERSE : EXPAND_FLAG_NORMAL)));
>  
>  	  else
>  	    {
> @@ -8337,7 +8374,7 @@ expand_expr_real_2 (sepops ops, rtx targ
>  				 * BITS_PER_UNIT),
>  				(HOST_WIDE_INT) GET_MODE_BITSIZE (mode)),
>  			   0, 0, 0, TYPE_MODE (valtype), treeop0, 0,
> -			   false, false);
> +			   EXPAND_FLAG_NORMAL);
>  	    }
>  
>  	  /* Return the entire union.  */
> @@ -9548,15 +9585,15 @@ expand_expr_real_2 (sepops ops, rtx targ
>  	jumpifnot (treeop0, lab0,
>  		   profile_probability::uninitialized ());
>  	store_expr (treeop1, temp,
> -		    modifier == EXPAND_STACK_PARM,
> -		    false, false);
> +		    modifier == EXPAND_STACK_PARM
> +		    ? EXPAND_FLAG_CALL_PARAM_P : EXPAND_FLAG_NORMAL);
>  
>  	emit_jump_insn (targetm.gen_jump (lab1));
>  	emit_barrier ();
>  	emit_label (lab0);
>  	store_expr (treeop2, temp,
> -		    modifier == EXPAND_STACK_PARM,
> -		    false, false);
> +		    modifier == EXPAND_STACK_PARM
> +		    ? EXPAND_FLAG_CALL_PARAM_P : EXPAND_FLAG_NORMAL);
>  
>  	emit_label (lab1);
>  	OK_DEFER_POP;
> @@ -10182,7 +10219,7 @@ expand_expr_real_1 (tree exp, rtx target
>  	      {
>  		temp = assign_stack_temp (DECL_MODE (base),
>  					  GET_MODE_SIZE (DECL_MODE (base)));
> -		store_expr (base, temp, 0, false, false);
> +		store_expr (base, temp, EXPAND_FLAG_NORMAL);
>  		temp = adjust_address (temp, BLKmode, offset);
>  		set_mem_size (temp, int_size_in_bytes (type));
>  		return temp;
> @@ -11075,13 +11112,13 @@ expand_expr_real_1 (tree exp, rtx target
>  		     value ? 0 : label,
>  		     profile_probability::uninitialized ());
>  	    expand_assignment (lhs, build_int_cst (TREE_TYPE (rhs), value),
> -			       false);
> +			       EXPAND_FLAG_NORMAL);
>  	    do_pending_stack_adjust ();
>  	    emit_label (label);
>  	    return const0_rtx;
>  	  }
>  
> -	expand_assignment (lhs, rhs, false);
> +	expand_assignment (lhs, rhs, EXPAND_FLAG_NORMAL);
>  	return const0_rtx;
>        }
>  
> --- gcc/testsuite/gcc.target/i386/pr41455.c.jj	2017-12-06 18:06:10.552506649 +0100
> +++ gcc/testsuite/gcc.target/i386/pr41455.c	2017-12-06 18:05:51.000000000 +0100
> @@ -0,0 +1,23 @@
> +/* PR middle-end/41455 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mstringop-strategy=libcall" } */
> +/* Verify we tail call memcpy and memset.  */
> +/* { dg-final { scan-assembler "jmp\[ \t]*_*memcpy" } } */
> +/* { dg-final { scan-assembler "jmp\[ \t]*_*memset" } } */
> +
> +struct S { char c[111111]; };
> +
> +void
> +foo (struct S *a, struct S *b, int *c)
> +{
> +  *c = 0;
> +  *a = *b;
> +}
> +
> +void
> +bar (struct S *a, int *b, int *c)
> +{
> +  *b = 0;
> +  *c = 0;
> +  *a = (struct S) {};
> +}
> 
> 	Jakub
> 
>
Jakub Jelinek Dec. 15, 2017, 9:59 a.m. UTC | #2
On Fri, Dec 15, 2017 at 10:30:32AM +0100, Richard Biener wrote:
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> Hum, it doesn't look pretty ;)  Can we defer this to stage1 given
> it's a long-standing issue and we have quite big changes going in still?

Ok, deferred.

	Jakub
Jeff Law June 23, 2018, 5:12 a.m. UTC | #3
On 12/06/2017 02:04 PM, Jakub Jelinek wrote:
> Hi!
> 
> Aggregate assignments and clears aren't in GIMPLE represented as calls,
> and while often they expand inline, sometimes we emit libcalls for them.
> This patch allows us to tail call those libcalls if there is nothing
> after them.  The patch changes the tailcall pass, so that it recognizes
> a = b; and c = {}; statements under certain conditions as potential tail
> calls returning void, and if it finds good tail call candidates, it marks
> them specially.  Because we have only a single bit left for GIMPLE_ASSIGN,
> I've decided to wrap the rhs1 into a new internal call, so
> a = b; will be transformed into a = TAILCALL_ASSIGN (b); and
> c = {}; will be transformed into c = TAILCALL_ASSIGN ();
> The rest of the patch is about propagating the flag (may use tailcall if
> the emit_block_move or clear_storage is the last thing emitted) down
> through expand_assignment and functions it calls.
> 
> Those functions use 1-3 other flags, so instead of adding another bool
> to all of them (next to nontemporal, call_param_p, reverse) I've decided
> to pass around a bitmask of flags.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2017-12-06  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR target/41455
> 	PR target/82935
> 	* internal-fn.def (TAILCALL_ASSIGN): New internal function.
> 	* internal-fn.c (expand_LAUNDER): Pass EXPAND_FLAG_NORMAL to
> 	expand_assignment.
> 	(expand_TAILCALL_ASSIGN): New function.
> 	* tree-tailcall.c (struct tailcall): Adjust comment.
> 	(find_tail_calls): Recognize also aggregate assignments and
> 	aggregate clearing as possible tail calls.  Use is_gimple_assign
> 	instead of gimple_code check.
> 	(optimize_tail_call): Rewrite aggregate assignments or aggregate
> 	clearing in tail call positions using IFN_TAILCALL_ASSIGN
> 	internal function.
> 	* tree-outof-ssa.c (insert_value_copy_on_edge): Adjust store_expr
> 	caller.
> 	* tree-chkp.c (chkp_expand_bounds_reset_for_mem): Adjust
> 	expand_assignment caller.
> 	* function.c (assign_parm_setup_reg): Likewise.
> 	* ubsan.c (ubsan_encode_value): Likewise.
> 	* cfgexpand.c (expand_call_stmt, expand_asm_stmt): Likewise.
> 	(expand_gimple_stmt_1): Likewise.  Fix up formatting.
> 	* calls.c (initialize_argument_information): Adjust store_expr caller.
> 	* expr.h (enum expand_flag): New.
> 	(expand_assignment): Replace bool argument with enum expand_flag.
> 	(store_expr_with_bounds, store_expr): Replace int, bool, bool arguments
> 	with enum expand_flag.
> 	* expr.c (expand_assignment): Replace nontemporal argument with flags.
> 	Assert no bits other than EXPAND_FLAG_NONTEMPORAL and
> 	EXPAND_FLAG_TAILCALL are set.  Adjust store_expr, store_fields and
> 	store_expr_with_bounds callers.
> 	(store_expr_with_bounds): Replace call_param_p, nontemporal and
> 	reverse args with flags argument.  Adjust recursive calls.  Pass
> 	BLOCK_OP_TAILCALL to clear_storage and expand_block_move if
> 	EXPAND_FLAG_TAILCALL is set.  Call clear_storage directly for
> 	EXPAND_FLAG_TAILCALL assignments from emtpy CONSTRUCTOR.
> 	(store_expr): Replace call_param_p, nontemporal and reverse args
> 	with flags argument.  Adjust store_expr_with_bounds caller.
> 	(store_constructor_field): Adjust store_field caller.
> 	(store_constructor): Adjust store_expr and expand_assignment callers.
> 	(store_field): Replace nontemporal and reverse arguments with flags
> 	argument.  Adjust store_expr callers.  Pass BLOCK_OP_TAILCALL to
> 	emit_block_move if EXPAND_FLAG_TAILCALL is set.
> 	(expand_expr_real_2): Adjust store_expr and store_field callers.
> 	(expand_expr_real_1): Adjust store_expr and expand_assignment callers.
> 
> 	* gcc.target/i386/pr41455.c: New test.
This looks pretty reasonable to me.  Is it big?  Yes, but a fair amount
is changing how we pass flags into the expanders.

I think you just need to merge back to the trunk retest and this should
be good to go.

jeff
diff mbox series

Patch

--- gcc/internal-fn.def.jj	2017-12-06 09:02:30.072952012 +0100
+++ gcc/internal-fn.def	2017-12-06 16:56:20.958518104 +0100
@@ -254,6 +254,11 @@  DEF_INTERNAL_FN (LAUNDER, ECF_LEAF | ECF
 /* Divmod function.  */
 DEF_INTERNAL_FN (DIVMOD, ECF_CONST | ECF_LEAF, NULL)
 
+/* Special markup for aggregate copy or clear that can be implemented
+   using a tailcall.  lhs = rhs1; is represented by
+   lhs = TAILCALL_ASSIGN (rhs1); and lhs = {}; by lhs = TAILCALL_ASSIGN ();  */
+DEF_INTERNAL_FN (TAILCALL_ASSIGN, ECF_NOTHROW | ECF_LEAF, NULL)
+
 #undef DEF_INTERNAL_INT_FN
 #undef DEF_INTERNAL_FLT_FN
 #undef DEF_INTERNAL_FLT_FLOATN_FN
--- gcc/internal-fn.c.jj	2017-12-06 09:02:29.968953307 +0100
+++ gcc/internal-fn.c	2017-12-06 18:00:15.993826828 +0100
@@ -2672,7 +2672,7 @@  expand_LAUNDER (internal_fn, gcall *call
   if (!lhs)
     return;
 
-  expand_assignment (lhs, gimple_call_arg (call, 0), false);
+  expand_assignment (lhs, gimple_call_arg (call, 0), EXPAND_FLAG_NORMAL);
 }
 
 /* Expand DIVMOD() using:
@@ -2722,6 +2722,23 @@  expand_DIVMOD (internal_fn, gcall *call_
 	       target, VOIDmode, EXPAND_NORMAL);
 }
 
+/* Expand TAILCALL_ASSIGN.  */
+
+static void
+expand_TAILCALL_ASSIGN (internal_fn, gcall *call_stmt)
+{
+  tree lhs = gimple_call_lhs (call_stmt);
+  tree rhs;
+  if (gimple_call_num_args (call_stmt) == 0)
+    {
+      rhs = build_constructor (TREE_TYPE (lhs), NULL);
+      TREE_STATIC (rhs) = 1;
+    }
+  else
+    rhs = gimple_call_arg (call_stmt, 0);
+  expand_assignment (lhs, rhs, EXPAND_FLAG_TAILCALL);
+}
+
 /* Expand a call to FN using the operands in STMT.  FN has a single
    output operand and NARGS input operands.  */
 
--- gcc/tree-tailcall.c.jj	2017-12-06 09:02:30.031952522 +0100
+++ gcc/tree-tailcall.c	2017-12-06 17:59:37.166299929 +0100
@@ -106,7 +106,8 @@  along with GCC; see the file COPYING3.
 
 struct tailcall
 {
-  /* The iterator pointing to the call statement.  */
+  /* The iterator pointing to the call (or aggregate copy that might be
+     expanded as call) statement.  */
   gimple_stmt_iterator call_gsi;
 
   /* True if it is a call to the current function.  */
@@ -398,8 +399,7 @@  static void
 find_tail_calls (basic_block bb, struct tailcall **ret)
 {
   tree ass_var = NULL_TREE, ret_var, func, param;
-  gimple *stmt;
-  gcall *call = NULL;
+  gimple *stmt, *call = NULL;
   gimple_stmt_iterator gsi, agsi;
   bool tail_recursion;
   struct tailcall *nw;
@@ -428,7 +428,7 @@  find_tail_calls (basic_block bb, struct
       /* Check for a call.  */
       if (is_gimple_call (stmt))
 	{
-	  call = as_a <gcall *> (stmt);
+	  call = stmt;
 	  ass_var = gimple_call_lhs (call);
 	  break;
 	}
@@ -440,6 +440,38 @@  find_tail_calls (basic_block bb, struct
 	  && auto_var_in_fn_p (gimple_assign_rhs1 (stmt), cfun->decl))
 	continue;
 
+      /* In addition to calls, allow aggregate copies that could be expanded
+	 as memcpy or memset.  Pretend it has NULL lhs then.  */
+      if (gimple_references_memory_p (stmt)
+	  && gimple_assign_single_p (stmt)
+	  && !gimple_has_volatile_ops (stmt)
+	  && !gimple_assign_nontemporal_move_p (as_a <gassign *> (stmt))
+	  && gimple_vdef (stmt)
+	  && !is_gimple_reg_type (TREE_TYPE (gimple_assign_lhs (stmt))))
+	{
+	  tree lhs = gimple_assign_lhs (stmt);
+	  if (TYPE_MODE (TREE_TYPE (lhs)) != BLKmode)
+	    return;
+	  tree rhs1 = gimple_assign_rhs1 (stmt);
+	  if (auto_var_in_fn_p (get_base_address (lhs), cfun->decl))
+	    return;
+	  if (TREE_CODE (rhs1) == CONSTRUCTOR)
+	    {
+	      if (CONSTRUCTOR_NELTS (rhs1) != 0 || !TREE_STATIC (rhs1))
+		return;
+	    }
+	  else if (auto_var_in_fn_p (get_base_address (rhs1), cfun->decl))
+	    return;
+	  if (reverse_storage_order_for_component_p (lhs)
+	      || reverse_storage_order_for_component_p (rhs1))
+	    return;
+	  if (operand_equal_p (lhs, rhs1, 0))
+	    return;
+	  call = stmt;
+	  ass_var = NULL_TREE;
+	  break;
+	}
+
       /* If the statement references memory or volatile operands, fail.  */
       if (gimple_references_memory_p (stmt)
 	  || gimple_has_volatile_ops (stmt))
@@ -474,7 +506,7 @@  find_tail_calls (basic_block bb, struct
 
   /* We found the call, check whether it is suitable.  */
   tail_recursion = false;
-  func = gimple_call_fndecl (call);
+  func = is_gimple_call (call) ? gimple_call_fndecl (call) : NULL_TREE;
   if (func
       && !DECL_BUILT_IN (func)
       && recursive_call_p (current_function_decl, func))
@@ -521,7 +553,9 @@  find_tail_calls (basic_block bb, struct
 	  && auto_var_in_fn_p (var, cfun->decl)
 	  && may_be_aliased (var)
 	  && (ref_maybe_used_by_stmt_p (call, var)
-	      || call_may_clobber_ref_p (call, var)))
+	      || (is_gimple_call (call)
+		  ? call_may_clobber_ref_p (as_a <gcall *> (call), var)
+		  : refs_output_dependent_p (gimple_assign_lhs (call), var))))
 	return;
     }
 
@@ -560,7 +594,7 @@  find_tail_calls (basic_block bb, struct
 	  || is_gimple_debug (stmt))
 	continue;
 
-      if (gimple_code (stmt) != GIMPLE_ASSIGN)
+      if (!is_gimple_assign (stmt))
 	return;
 
       /* This is a gimple assign. */
@@ -956,9 +990,31 @@  optimize_tail_call (struct tailcall *t,
 
   if (opt_tailcalls)
     {
-      gcall *stmt = as_a <gcall *> (gsi_stmt (t->call_gsi));
-
-      gimple_call_set_tail (stmt, true);
+      gimple *stmt = gsi_stmt (t->call_gsi);
+      if (gcall *call = dyn_cast <gcall *> (stmt))
+	gimple_call_set_tail (call, true);
+      else
+	{
+	  tree lhs = gimple_assign_lhs (stmt);
+	  tree rhs1 = gimple_assign_rhs1 (stmt);
+	  gcall *g;
+	  if (TREE_CODE (rhs1) == CONSTRUCTOR)
+	    g = gimple_build_call_internal (IFN_TAILCALL_ASSIGN, 0);
+	  else
+	    g = gimple_build_call_internal (IFN_TAILCALL_ASSIGN, 1,
+					    rhs1);
+	  gimple_call_set_lhs (g, lhs);
+	  gimple_set_location (g, gimple_location (stmt));
+	  if (gimple_vdef (stmt)
+	      && TREE_CODE (gimple_vdef (stmt)) == SSA_NAME)
+	    {
+	      gimple_set_vdef (g, gimple_vdef (stmt));
+	      SSA_NAME_DEF_STMT (gimple_vdef (g)) = g;
+	    }
+	  if (gimple_vuse (stmt))
+	    gimple_set_vuse (g, gimple_vuse (stmt));
+	  gsi_replace (&t->call_gsi, g, false);
+	}
       cfun->tail_call_marked = true;
       if (dump_file && (dump_flags & TDF_DETAILS))
         {
--- gcc/tree-outof-ssa.c.jj	2017-10-09 09:41:21.000000000 +0200
+++ gcc/tree-outof-ssa.c	2017-12-06 17:17:16.468209052 +0100
@@ -311,7 +311,7 @@  insert_value_copy_on_edge (edge e, int d
   else if (src_mode == BLKmode)
     {
       x = dest_rtx;
-      store_expr (src, x, 0, false, false);
+      store_expr (src, x, EXPAND_FLAG_NORMAL);
     }
   else
     x = expand_expr (src, dest_rtx, dest_mode, EXPAND_NORMAL);
--- gcc/tree-chkp.c.jj	2017-12-06 09:02:30.141951153 +0100
+++ gcc/tree-chkp.c	2017-12-06 16:56:20.956518128 +0100
@@ -481,7 +481,7 @@  chkp_expand_bounds_reset_for_mem (tree m
 		 build_pointer_type (TREE_TYPE (mem)), mem);
   bndstx = chkp_build_bndstx_call (addr, ptr, bnd);
 
-  expand_assignment (bnd, zero_bnd, false);
+  expand_assignment (bnd, zero_bnd, EXPAND_FLAG_NORMAL);
   expand_normal (bndstx);
 }
 
--- gcc/function.c.jj	2017-12-06 09:02:29.991953021 +0100
+++ gcc/function.c	2017-12-06 16:56:20.958518104 +0100
@@ -3284,7 +3284,8 @@  assign_parm_setup_reg (struct assign_par
       /* TREE_USED gets set erroneously during expand_assignment.  */
       save_tree_used = TREE_USED (parm);
       SET_DECL_RTL (parm, rtl);
-      expand_assignment (parm, make_tree (data->nominal_type, tempreg), false);
+      expand_assignment (parm, make_tree (data->nominal_type, tempreg),
+			 EXPAND_FLAG_NORMAL);
       SET_DECL_RTL (parm, NULL_RTX);
       TREE_USED (parm) = save_tree_used;
       all->first_conversion_insn = get_insns ();
--- gcc/ubsan.c.jj	2017-12-06 09:02:29.951953519 +0100
+++ gcc/ubsan.c	2017-12-06 16:56:20.960518080 +0100
@@ -165,7 +165,7 @@  ubsan_encode_value (tree t, enum ubsan_e
 	      rtx mem = assign_stack_temp_for_type (mode, GET_MODE_SIZE (mode),
 						    type);
 	      SET_DECL_RTL (var, mem);
-	      expand_assignment (var, t, false);
+	      expand_assignment (var, t, EXPAND_FLAG_NORMAL);
 	      return build_fold_addr_expr (var);
 	    }
 	  if (phase != UBSAN_ENCODE_VALUE_GENERIC)
--- gcc/cfgexpand.c.jj	2017-12-06 09:02:30.056952211 +0100
+++ gcc/cfgexpand.c	2017-12-06 16:56:20.959518092 +0100
@@ -2668,7 +2668,7 @@  expand_call_stmt (gcall *stmt)
   rtx_insn *before_call = get_last_insn ();
   lhs = gimple_call_lhs (stmt);
   if (lhs)
-    expand_assignment (lhs, exp, false);
+    expand_assignment (lhs, exp, EXPAND_FLAG_NORMAL);
   else
     expand_expr (exp, const0_rtx, VOIDmode, EXPAND_NORMAL);
 
@@ -3071,7 +3071,7 @@  expand_asm_stmt (gasm *stmt)
 	  generating_concat_p = old_generating_concat_p;
 
 	  push_to_sequence2 (after_rtl_seq, after_rtl_end);
-	  expand_assignment (val, make_tree (type, op), false);
+	  expand_assignment (val, make_tree (type, op), EXPAND_FLAG_NORMAL);
 	  after_rtl_seq = get_insns ();
 	  after_rtl_end = get_last_insn ();
 	  end_sequence ();
@@ -3672,9 +3672,14 @@  expand_gimple_stmt_1 (gimple *stmt)
 		 this LHS.  */
 	      ;
 	    else
-	      expand_assignment (lhs, rhs,
-				 gimple_assign_nontemporal_move_p (
-				   assign_stmt));
+	      {
+		enum expand_flag flag;
+		if (gimple_assign_nontemporal_move_p (assign_stmt))
+		  flag = EXPAND_FLAG_NONTEMPORAL;
+		else
+		  flag = EXPAND_FLAG_NORMAL;
+		expand_assignment (lhs, rhs, flag);
+	      }
 	  }
 	else
 	  {
--- gcc/calls.c.jj	2017-11-22 21:37:50.000000000 +0100
+++ gcc/calls.c	2017-12-06 17:10:12.432363809 +0100
@@ -1971,7 +1971,7 @@  initialize_argument_information (int num
 	      else
 		copy = assign_temp (type, 1, 0);
 
-	      store_expr (args[i].tree_value, copy, 0, false, false);
+	      store_expr (args[i].tree_value, copy, EXPAND_FLAG_NORMAL);
 
 	      /* Just change the const function to pure and then let
 		 the next test clear the pure based on
--- gcc/expr.h.jj	2017-12-06 09:02:30.120951414 +0100
+++ gcc/expr.h	2017-12-06 17:09:34.782823601 +0100
@@ -35,6 +35,26 @@  enum expand_modifier {EXPAND_NORMAL = 0,
 		      EXPAND_CONST_ADDRESS, EXPAND_INITIALIZER, EXPAND_WRITE,
 		      EXPAND_MEMORY};
 
+/* Flags arguments for expand_assignment/store_expr*.  The argument is
+   a bitwise or of these flags.  */
+enum expand_flag {
+  /* Value if none of the flags are set.  */
+  EXPAND_FLAG_NORMAL = 0,
+  /* Expand the assignment/store as nontemporal store if possible.  */
+  EXPAND_FLAG_NONTEMPORAL = 1,
+  /* If the assignment is expanded as a libcall, it can be a tail call.  */
+  EXPAND_FLAG_TAILCALL = 2,
+
+  /* Flags below this point are only for store_expr*, not for
+     expand_assignment.  */
+
+  /* Reverse bytes in the store.  */
+  EXPAND_FLAG_REVERSE = 4,
+  /* True for stores into call params on the stack, where block moves to
+     that may need special treatment.  */
+  EXPAND_FLAG_CALL_PARAM_P = 8
+};
+
 /* Prevent the compiler from deferring stack pops.  See
    inhibit_defer_pop for more information.  */
 #define NO_DEFER_POP (inhibit_defer_pop += 1)
@@ -244,14 +264,14 @@  extern void get_bit_range (unsigned HOST
 			   tree, HOST_WIDE_INT *, tree *);
 
 /* Expand an assignment that stores the value of FROM into TO.  */
-extern void expand_assignment (tree, tree, bool);
+extern void expand_assignment (tree, tree, enum expand_flag);
 
 /* Generate code for computing expression EXP,
    and storing the value into TARGET.
    If SUGGEST_REG is nonzero, copy the value through a register
    and return that register, if that is possible.  */
-extern rtx store_expr_with_bounds (tree, rtx, int, bool, bool, tree);
-extern rtx store_expr (tree, rtx, int, bool, bool);
+extern rtx store_expr_with_bounds (tree, rtx, enum expand_flag, tree);
+extern rtx store_expr (tree, rtx, enum expand_flag);
 
 /* Given an rtx that may include add and multiply operations,
    generate them as insns and return a pseudo-reg containing the value.
--- gcc/expr.c.jj	2017-12-06 09:02:30.103951626 +0100
+++ gcc/expr.c	2017-12-06 17:54:46.185845429 +0100
@@ -86,7 +86,7 @@  static void store_constructor_field (rtx
 static void store_constructor (tree, rtx, int, HOST_WIDE_INT, bool);
 static rtx store_field (rtx, HOST_WIDE_INT, HOST_WIDE_INT,
 			unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT,
-			machine_mode, tree, alias_set_type, bool, bool);
+			machine_mode, tree, alias_set_type, enum expand_flag);
 
 static unsigned HOST_WIDE_INT highest_pow2_factor_for_target (const_tree, const_tree);
 
@@ -4915,11 +4915,12 @@  mem_ref_refers_to_non_mem_p (tree ref)
   return addr_expr_of_non_mem_decl_p_1 (base, false);
 }
 
-/* Expand an assignment that stores the value of FROM into TO.  If NONTEMPORAL
-   is true, try generating a nontemporal store.  */
+/* Expand an assignment that stores the value of FROM into TO.  If
+   flags & EXPAND_FLAG_NONTEMPORAL, try generating a nontemporal store,
+   if flags & EXPAND_FLAG_TAILCALL, allow generating a tail call.  */
 
 void
-expand_assignment (tree to, tree from, bool nontemporal)
+expand_assignment (tree to, tree from, enum expand_flag flags)
 {
   rtx to_rtx = 0;
   rtx result;
@@ -4927,6 +4928,10 @@  expand_assignment (tree to, tree from, b
   unsigned int align;
   enum insn_code icode;
 
+  /* Rest of the flags only make sense for store_*.  */
+  gcc_checking_assert ((flags & ~(EXPAND_FLAG_NONTEMPORAL
+				  | EXPAND_FLAG_TAILCALL)) == 0);
+
   /* Don't crash if the lhs of the assignment was erroneous.  */
   if (TREE_CODE (to) == ERROR_MARK)
     {
@@ -4992,6 +4997,7 @@  expand_assignment (tree to, tree from, b
       tree offset;
       int unsignedp, reversep, volatilep = 0;
       tree tem;
+      enum expand_flag flags_rev = flags;
 
       push_temp_slots ();
       tem = get_inner_reference (to, &bitsize, &bitpos, &offset, &mode1,
@@ -5004,6 +5010,8 @@  expand_assignment (tree to, tree from, b
 	  offset = size_int (bitpos >> LOG2_BITS_PER_UNIT);
 	  bitpos &= BITS_PER_UNIT - 1;
 	}
+      if (reversep)
+	flags_rev = (enum expand_flag) (flags | EXPAND_FLAG_REVERSE);
 
       if (TREE_CODE (to) == COMPONENT_REF
 	  && DECL_BIT_FIELD_TYPE (TREE_OPERAND (to, 1)))
@@ -5114,22 +5122,21 @@  expand_assignment (tree to, tree from, b
 	      && COMPLEX_MODE_P (GET_MODE (to_rtx))
 	      && bitpos == 0
 	      && bitsize == mode_bitsize)
-	    result = store_expr (from, to_rtx, false, nontemporal, reversep);
+	    result = store_expr (from, to_rtx, flags_rev);
 	  else if (bitsize == mode_bitsize / 2
 		   && (bitpos == 0 || bitpos == mode_bitsize / 2))
-	    result = store_expr (from, XEXP (to_rtx, bitpos != 0), false,
-				 nontemporal, reversep);
+	    result = store_expr (from, XEXP (to_rtx, bitpos != 0), flags_rev);
 	  else if (bitpos + bitsize <= mode_bitsize / 2)
 	    result = store_field (XEXP (to_rtx, 0), bitsize, bitpos,
 				  bitregion_start, bitregion_end,
 				  mode1, from, get_alias_set (to),
-				  nontemporal, reversep);
+				  flags_rev);
 	  else if (bitpos >= mode_bitsize / 2)
 	    result = store_field (XEXP (to_rtx, 1), bitsize,
 				  bitpos - mode_bitsize / 2,
 				  bitregion_start, bitregion_end,
 				  mode1, from, get_alias_set (to),
-				  nontemporal, reversep);
+				  flags_rev);
 	  else if (bitpos == 0 && bitsize == mode_bitsize)
 	    {
 	      result = expand_normal (from);
@@ -5166,7 +5173,8 @@  expand_assignment (tree to, tree from, b
 	      result = store_field (temp, bitsize, bitpos,
 				    bitregion_start, bitregion_end,
 				    mode1, from, get_alias_set (to),
-				    nontemporal, reversep);
+				    (enum expand_flag)
+				    (flags_rev & ~EXPAND_FLAG_TAILCALL));
 	      emit_move_insn (XEXP (to_rtx, 0), read_complex_part (temp, false));
 	      emit_move_insn (XEXP (to_rtx, 1), read_complex_part (temp, true));
 	    }
@@ -5192,7 +5200,7 @@  expand_assignment (tree to, tree from, b
 	    result = store_field (to_rtx, bitsize, bitpos,
 				  bitregion_start, bitregion_end,
 				  mode1, from, get_alias_set (to),
-				  nontemporal, reversep);
+				  flags_rev);
 	}
 
       if (result)
@@ -5341,7 +5349,7 @@  expand_assignment (tree to, tree from, b
   /* Compute FROM and store the value in the rtx we got.  */
 
   push_temp_slots ();
-  result = store_expr_with_bounds (from, to_rtx, 0, nontemporal, false, to);
+  result = store_expr_with_bounds (from, to_rtx, flags, to);
   preserve_temp_slots (result);
   pop_temp_slots ();
   return;
@@ -5375,19 +5383,14 @@  emit_storent_insn (rtx to, rtx from)
    with no sequence point.  Will other languages need this to
    be more thorough?
 
-   If CALL_PARAM_P is nonzero, this is a store into a call param on the
-   stack, and block moves may need to be treated specially.
-
-   If NONTEMPORAL is true, try using a nontemporal store instruction.
-
-   If REVERSE is true, the store is to be done in reverse order.
+   FLAGS is a bitwise or of EXPAND_FLAG_* defined in expr.h.
 
    If BTARGET is not NULL then computed bounds of EXP are
    associated with BTARGET.  */
 
 rtx
-store_expr_with_bounds (tree exp, rtx target, int call_param_p,
-			bool nontemporal, bool reverse, tree btarget)
+store_expr_with_bounds (tree exp, rtx target, enum expand_flag flags,
+			tree btarget)
 {
   rtx temp;
   rtx alt_rtl = NULL_RTX;
@@ -5398,7 +5401,7 @@  store_expr_with_bounds (tree exp, rtx ta
       /* C++ can generate ?: expressions with a throw expression in one
 	 branch and an rvalue in the other. Here, we resolve attempts to
 	 store the throw expression's nonexistent result.  */
-      gcc_assert (!call_param_p);
+      gcc_assert ((flags & EXPAND_FLAG_CALL_PARAM_P) == 0);
       expand_expr (exp, const0_rtx, VOIDmode, EXPAND_NORMAL);
       return NULL_RTX;
     }
@@ -5407,9 +5410,9 @@  store_expr_with_bounds (tree exp, rtx ta
       /* Perform first part of compound expression, then assign from second
 	 part.  */
       expand_expr (TREE_OPERAND (exp, 0), const0_rtx, VOIDmode,
-		   call_param_p ? EXPAND_STACK_PARM : EXPAND_NORMAL);
-      return store_expr_with_bounds (TREE_OPERAND (exp, 1), target,
-				     call_param_p, nontemporal, reverse,
+		   (flags & EXPAND_FLAG_CALL_PARAM_P)
+		   ? EXPAND_STACK_PARM : EXPAND_NORMAL);
+      return store_expr_with_bounds (TREE_OPERAND (exp, 1), target, flags,
 				     btarget);
     }
   else if (TREE_CODE (exp) == COND_EXPR && GET_MODE (target) == BLKmode)
@@ -5425,13 +5428,15 @@  store_expr_with_bounds (tree exp, rtx ta
       NO_DEFER_POP;
       jumpifnot (TREE_OPERAND (exp, 0), lab1,
 		 profile_probability::uninitialized ());
-      store_expr_with_bounds (TREE_OPERAND (exp, 1), target, call_param_p,
-			      nontemporal, reverse, btarget);
+      store_expr_with_bounds (TREE_OPERAND (exp, 1), target,
+			      (enum expand_flag)
+			      (flags & ~EXPAND_FLAG_TAILCALL), btarget);
       emit_jump_insn (targetm.gen_jump (lab2));
       emit_barrier ();
       emit_label (lab1);
-      store_expr_with_bounds (TREE_OPERAND (exp, 2), target, call_param_p,
-			      nontemporal, reverse, btarget);
+      store_expr_with_bounds (TREE_OPERAND (exp, 2), target,
+			      (enum expand_flag)
+			      (flags & ~EXPAND_FLAG_TAILCALL), btarget);
       emit_label (lab2);
       OK_DEFER_POP;
 
@@ -5482,7 +5487,8 @@  store_expr_with_bounds (tree exp, rtx ta
 	}
 
       temp = expand_expr (exp, inner_target, VOIDmode,
-			  call_param_p ? EXPAND_STACK_PARM : EXPAND_NORMAL);
+			  (flags & EXPAND_FLAG_CALL_PARAM_P)
+			  ? EXPAND_STACK_PARM : EXPAND_NORMAL);
 
       /* Handle bounds returned by call.  */
       if (TREE_CODE (exp) == CALL_EXPR)
@@ -5518,7 +5524,8 @@  store_expr_with_bounds (tree exp, rtx ta
 		&& TREE_CODE (TREE_OPERAND (TREE_OPERAND (exp, 0), 0))
 		   == STRING_CST
 		&& integer_zerop (TREE_OPERAND (exp, 1))))
-	   && !nontemporal && !call_param_p
+	   && (flags & (EXPAND_FLAG_NONTEMPORAL
+			| EXPAND_FLAG_CALL_PARAM_P)) == 0
 	   && MEM_P (target))
     {
       /* Optimize initialization of an array with a STRING_CST.  */
@@ -5562,7 +5569,21 @@  store_expr_with_bounds (tree exp, rtx ta
       if (exp_len > str_copy_len)
 	clear_storage (adjust_address (dest_mem, BLKmode, 0),
 		       GEN_INT (exp_len - str_copy_len),
-		       BLOCK_OP_NORMAL);
+		       (flags & EXPAND_FLAG_TAILCALL)
+		       ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL);
+      return NULL_RTX;
+    }
+  else if (flags == EXPAND_FLAG_TAILCALL
+	   && TREE_CODE (exp) == CONSTRUCTOR
+	   && TREE_CODE (TREE_TYPE (exp)) != ERROR_MARK
+	   && TREE_STATIC (exp)
+	   && !TREE_ADDRESSABLE (exp)
+	   && TYPE_MODE (TREE_TYPE (exp)) == BLKmode
+	   && MEM_P (target)
+	   && GET_MODE (target) == BLKmode
+	   && CONSTRUCTOR_NELTS (exp) == 0)
+    {
+      clear_storage (target, expr_size (exp), BLOCK_OP_TAILCALL);
       return NULL_RTX;
     }
   else
@@ -5572,9 +5593,10 @@  store_expr_with_bounds (tree exp, rtx ta
   normal_expr:
       /* If we want to use a nontemporal or a reverse order store, force the
 	 value into a register first.  */
-      tmp_target = nontemporal || reverse ? NULL_RTX : target;
+      tmp_target = (flags & (EXPAND_FLAG_NONTEMPORAL | EXPAND_FLAG_REVERSE))
+		   ? NULL_RTX : target;
       temp = expand_expr_real (exp, tmp_target, GET_MODE (target),
-			       (call_param_p
+			       ((flags & EXPAND_FLAG_CALL_PARAM_P)
 				? EXPAND_STACK_PARM : EXPAND_NORMAL),
 			       &alt_rtl, false);
 
@@ -5647,7 +5669,8 @@  store_expr_with_bounds (tree exp, rtx ta
 	      else
 		store_bit_field (target,
 				 INTVAL (expr_size (exp)) * BITS_PER_UNIT,
-				 0, 0, 0, GET_MODE (temp), temp, reverse);
+				 0, 0, 0, GET_MODE (temp), temp,
+				 (flags & EXPAND_FLAG_REVERSE) != 0);
 	    }
 	  else
 	    convert_move (target, temp, TYPE_UNSIGNED (TREE_TYPE (exp)));
@@ -5664,8 +5687,10 @@  store_expr_with_bounds (tree exp, rtx ta
 	  if (CONST_INT_P (size)
 	      && INTVAL (size) < TREE_STRING_LENGTH (exp))
 	    emit_block_move (target, temp, size,
-			     (call_param_p
-			      ? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL));
+			     ((flags & EXPAND_FLAG_CALL_PARAM_P)
+			      ? BLOCK_OP_CALL_PARM
+			      : (flags & EXPAND_FLAG_TAILCALL)
+			      ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL));
 	  else
 	    {
 	      machine_mode pointer_mode
@@ -5679,7 +5704,7 @@  store_expr_with_bounds (tree exp, rtx ta
 				  size_int (TREE_STRING_LENGTH (exp)));
 	      rtx copy_size_rtx
 		= expand_expr (copy_size, NULL_RTX, VOIDmode,
-			       (call_param_p
+			       ((flags & EXPAND_FLAG_CALL_PARAM_P)
 				? EXPAND_STACK_PARM : EXPAND_NORMAL));
 	      rtx_code_label *label = 0;
 
@@ -5687,7 +5712,7 @@  store_expr_with_bounds (tree exp, rtx ta
 	      copy_size_rtx = convert_to_mode (pointer_mode, copy_size_rtx,
 					       TYPE_UNSIGNED (sizetype));
 	      emit_block_move (target, temp, copy_size_rtx,
-			       (call_param_p
+			       ((flags & EXPAND_FLAG_CALL_PARAM_P)
 				? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL));
 
 	      /* Figure out how much is left in TARGET that we have to clear.
@@ -5739,14 +5764,17 @@  store_expr_with_bounds (tree exp, rtx ta
 			  int_size_in_bytes (TREE_TYPE (exp)));
       else if (GET_MODE (temp) == BLKmode)
 	emit_block_move (target, temp, expr_size (exp),
-			 (call_param_p
-			  ? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL));
+			 ((flags & EXPAND_FLAG_CALL_PARAM_P)
+			  ? BLOCK_OP_CALL_PARM
+			  : (flags & EXPAND_FLAG_TAILCALL)
+			  ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL));
       /* If we emit a nontemporal store, there is nothing else to do.  */
-      else if (nontemporal && emit_storent_insn (target, temp))
+      else if ((flags & EXPAND_FLAG_NONTEMPORAL)
+	       && emit_storent_insn (target, temp))
 	;
       else
 	{
-	  if (reverse)
+	  if (flags & EXPAND_FLAG_REVERSE)
 	    temp = flip_storage_order (GET_MODE (target), temp);
 	  temp = force_operand (temp, target);
 	  if (temp != target)
@@ -5759,11 +5787,9 @@  store_expr_with_bounds (tree exp, rtx ta
 
 /* Same as store_expr_with_bounds but ignoring bounds of EXP.  */
 rtx
-store_expr (tree exp, rtx target, int call_param_p, bool nontemporal,
-	    bool reverse)
+store_expr (tree exp, rtx target, enum expand_flag flags)
 {
-  return store_expr_with_bounds (exp, target, call_param_p, nontemporal,
-				 reverse, NULL);
+  return store_expr_with_bounds (exp, target, flags, NULL);
 }
 
 /* Return true if field F of structure TYPE is a flexible array.  */
@@ -6141,7 +6167,8 @@  store_constructor_field (rtx target, uns
     }
   else
     store_field (target, bitsize, bitpos, bitregion_start, bitregion_end, mode,
-		 exp, alias_set, false, reverse);
+		 exp, alias_set,
+		 reverse ? EXPAND_FLAG_REVERSE : EXPAND_FLAG_NORMAL);
 }
 
 
@@ -6338,6 +6365,8 @@  store_constructor (tree exp, rtx target,
 
 	/* The storage order is specified for every aggregate type.  */
 	reverse = TYPE_REVERSE_STORAGE_ORDER (type);
+	enum expand_flag flags_rev
+	  = reverse ? EXPAND_FLAG_REVERSE : EXPAND_FLAG_NORMAL;
 
 	domain = TYPE_DOMAIN (type);
 	const_bounds_p = (TYPE_MIN_VALUE (domain)
@@ -6495,7 +6524,7 @@  store_constructor (tree exp, rtx target,
 					VAR_DECL, NULL_TREE, domain);
 		    index_r = gen_reg_rtx (promote_decl_mode (index, NULL));
 		    SET_DECL_RTL (index, index_r);
-		    store_expr (lo_index, index_r, 0, false, reverse);
+		    store_expr (lo_index, index_r, flags_rev);
 
 		    /* Build the head of the loop.  */
 		    do_pending_stack_adjust ();
@@ -6522,7 +6551,7 @@  store_constructor (tree exp, rtx target,
 		      store_constructor (value, xtarget, cleared,
 					 bitsize / BITS_PER_UNIT, reverse);
 		    else
-		      store_expr (value, xtarget, 0, false, reverse);
+		      store_expr (value, xtarget, flags_rev);
 
 		    /* Generate a conditional jump to exit the loop.  */
 		    exit_cond = build2 (LT_EXPR, integer_type_node,
@@ -6535,7 +6564,7 @@  store_constructor (tree exp, rtx target,
 		    expand_assignment (index,
 				       build2 (PLUS_EXPR, TREE_TYPE (index),
 					       index, integer_one_node),
-				       false);
+				       EXPAND_FLAG_NORMAL);
 
 		    emit_jump (loop_start);
 
@@ -6566,7 +6595,7 @@  store_constructor (tree exp, rtx target,
 					  expand_normal (position),
 					  highest_pow2_factor (position));
 		xtarget = adjust_address (xtarget, mode, 0);
-		store_expr (value, xtarget, 0, false, reverse);
+		store_expr (value, xtarget, flags_rev);
 	      }
 	    else
 	      {
@@ -6760,16 +6789,14 @@  store_constructor (tree exp, rtx target,
    (in general) be different from that for TARGET, since TARGET is a
    reference to the containing structure.
 
-   If NONTEMPORAL is true, try generating a nontemporal store.
-
-   If REVERSE is true, the store is to be done in reverse order.  */
+   FLAGS is a bitmask of EXPAND_FLAG_* flags defined in expr.h.  */
 
 static rtx
 store_field (rtx target, HOST_WIDE_INT bitsize, HOST_WIDE_INT bitpos,
 	     unsigned HOST_WIDE_INT bitregion_start,
 	     unsigned HOST_WIDE_INT bitregion_end,
 	     machine_mode mode, tree exp,
-	     alias_set_type alias_set, bool nontemporal,  bool reverse)
+	     alias_set_type alias_set, enum expand_flag flags)
 {
   if (TREE_CODE (exp) == ERROR_MARK)
     return const0_rtx;
@@ -6787,7 +6814,7 @@  store_field (rtx target, HOST_WIDE_INT b
       /* We're storing into a struct containing a single __complex.  */
 
       gcc_assert (!bitpos);
-      return store_expr (exp, target, 0, nontemporal, reverse);
+      return store_expr (exp, target, flags);
     }
 
   /* If the structure is in a register or if the component
@@ -6903,11 +6930,16 @@  store_field (rtx target, HOST_WIDE_INT b
 	{
 	  HOST_WIDE_INT size = GET_MODE_BITSIZE (temp_mode);
 
-	  reverse = TYPE_REVERSE_STORAGE_ORDER (TREE_TYPE (exp));
+	  bool reverse = TYPE_REVERSE_STORAGE_ORDER (TREE_TYPE (exp));
 
 	  if (reverse)
 	    temp = flip_storage_order (temp_mode, temp);
 
+	  if (reverse)
+	    flags = (enum expand_flag) (flags | EXPAND_FLAG_REVERSE);
+	  else
+	    flags = (enum expand_flag) (flags & ~EXPAND_FLAG_REVERSE);
+
 	  if (bitsize < size
 	      && reverse ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN
 	      && !(mode == BLKmode && bitsize > BITS_PER_WORD))
@@ -6937,7 +6969,8 @@  store_field (rtx target, HOST_WIDE_INT b
 	  emit_block_move (target, temp,
 			   GEN_INT ((bitsize + BITS_PER_UNIT - 1)
 				    / BITS_PER_UNIT),
-			   BLOCK_OP_NORMAL);
+			   (flags & EXPAND_FLAG_TAILCALL)
+			   ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL);
 
 	  return const0_rtx;
 	}
@@ -6954,7 +6987,7 @@  store_field (rtx target, HOST_WIDE_INT b
       /* Store the value in the bitfield.  */
       store_bit_field (target, bitsize, bitpos,
 		       bitregion_start, bitregion_end,
-		       mode, temp, reverse);
+		       mode, temp, (flags & EXPAND_FLAG_REVERSE) != 0);
 
       return const0_rtx;
     }
@@ -6974,11 +7007,12 @@  store_field (rtx target, HOST_WIDE_INT b
       if (TREE_CODE (exp) == CONSTRUCTOR && bitsize >= 0)
 	{
 	  gcc_assert (bitsize % BITS_PER_UNIT == 0);
-	  store_constructor (exp, to_rtx, 0, bitsize / BITS_PER_UNIT, reverse);
+	  store_constructor (exp, to_rtx, 0, bitsize / BITS_PER_UNIT,
+			     (flags & EXPAND_FLAG_REVERSE) != 0);
 	  return to_rtx;
 	}
 
-      return store_expr (exp, to_rtx, 0, nontemporal, reverse);
+      return store_expr (exp, to_rtx, flags);
     }
 }
 
@@ -8322,8 +8356,11 @@  expand_expr_real_2 (sepops ops, rtx targ
 	    /* Store data into beginning of memory target.  */
 	    store_expr (treeop0,
 			adjust_address (target, TYPE_MODE (valtype), 0),
-			modifier == EXPAND_STACK_PARM,
-			false, TYPE_REVERSE_STORAGE_ORDER (type));
+			(enum expand_flag)
+			((modifier == EXPAND_STACK_PARM
+			 ? EXPAND_FLAG_CALL_PARAM_P : EXPAND_FLAG_NORMAL)
+			 | (TYPE_REVERSE_STORAGE_ORDER (type)
+			    ? EXPAND_FLAG_REVERSE : EXPAND_FLAG_NORMAL)));
 
 	  else
 	    {
@@ -8337,7 +8374,7 @@  expand_expr_real_2 (sepops ops, rtx targ
 				 * BITS_PER_UNIT),
 				(HOST_WIDE_INT) GET_MODE_BITSIZE (mode)),
 			   0, 0, 0, TYPE_MODE (valtype), treeop0, 0,
-			   false, false);
+			   EXPAND_FLAG_NORMAL);
 	    }
 
 	  /* Return the entire union.  */
@@ -9548,15 +9585,15 @@  expand_expr_real_2 (sepops ops, rtx targ
 	jumpifnot (treeop0, lab0,
 		   profile_probability::uninitialized ());
 	store_expr (treeop1, temp,
-		    modifier == EXPAND_STACK_PARM,
-		    false, false);
+		    modifier == EXPAND_STACK_PARM
+		    ? EXPAND_FLAG_CALL_PARAM_P : EXPAND_FLAG_NORMAL);
 
 	emit_jump_insn (targetm.gen_jump (lab1));
 	emit_barrier ();
 	emit_label (lab0);
 	store_expr (treeop2, temp,
-		    modifier == EXPAND_STACK_PARM,
-		    false, false);
+		    modifier == EXPAND_STACK_PARM
+		    ? EXPAND_FLAG_CALL_PARAM_P : EXPAND_FLAG_NORMAL);
 
 	emit_label (lab1);
 	OK_DEFER_POP;
@@ -10182,7 +10219,7 @@  expand_expr_real_1 (tree exp, rtx target
 	      {
 		temp = assign_stack_temp (DECL_MODE (base),
 					  GET_MODE_SIZE (DECL_MODE (base)));
-		store_expr (base, temp, 0, false, false);
+		store_expr (base, temp, EXPAND_FLAG_NORMAL);
 		temp = adjust_address (temp, BLKmode, offset);
 		set_mem_size (temp, int_size_in_bytes (type));
 		return temp;
@@ -11075,13 +11112,13 @@  expand_expr_real_1 (tree exp, rtx target
 		     value ? 0 : label,
 		     profile_probability::uninitialized ());
 	    expand_assignment (lhs, build_int_cst (TREE_TYPE (rhs), value),
-			       false);
+			       EXPAND_FLAG_NORMAL);
 	    do_pending_stack_adjust ();
 	    emit_label (label);
 	    return const0_rtx;
 	  }
 
-	expand_assignment (lhs, rhs, false);
+	expand_assignment (lhs, rhs, EXPAND_FLAG_NORMAL);
 	return const0_rtx;
       }
 
--- gcc/testsuite/gcc.target/i386/pr41455.c.jj	2017-12-06 18:06:10.552506649 +0100
+++ gcc/testsuite/gcc.target/i386/pr41455.c	2017-12-06 18:05:51.000000000 +0100
@@ -0,0 +1,23 @@ 
+/* PR middle-end/41455 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mstringop-strategy=libcall" } */
+/* Verify we tail call memcpy and memset.  */
+/* { dg-final { scan-assembler "jmp\[ \t]*_*memcpy" } } */
+/* { dg-final { scan-assembler "jmp\[ \t]*_*memset" } } */
+
+struct S { char c[111111]; };
+
+void
+foo (struct S *a, struct S *b, int *c)
+{
+  *c = 0;
+  *a = *b;
+}
+
+void
+bar (struct S *a, int *b, int *c)
+{
+  *b = 0;
+  *c = 0;
+  *a = (struct S) {};
+}