diff mbox series

sra: SRA of non-escaped aggregates passed by reference to calls

Message ID ri61qcpr6bw.fsf@
State New
Headers show
Series sra: SRA of non-escaped aggregates passed by reference to calls | expand

Commit Message

Martin Jambor Nov. 16, 2023, 4:49 p.m. UTC
Hello,

PR109849 shows that a loop that heavily pushes and pops from a stack
implemented by a C++ std::vec results in slow code, mainly because the
vector structure is not split by SRA and so we end up in many loads
and stores into it.  This is because it is passed by reference
to (re)allocation methods and so needs to live in memory, even though
it does not escape from them and so we could SRA it if we
re-constructed it before the call and then separated it to distinct
replacements afterwards.

This patch does exactly that, first relaxing the selection of
candidates to also include those which are addressable but do not
escape and then adding code to deal with the calls.  The
micro-benchmark that is also the (scan-dump) testcase in this patch
runs twice as fast with it than with current trunk.  Honza measured
its effect on the libjxl benchmark and it almost closes the
performance gap between Clang and GCC while not requiring excessive
inlining and thus code growth.

The patch disallows creation of replacements for such aggregates which
are also accessed with a precision smaller than their size because I
have observed that this led to excessive zero-extending of data
leading to slow-downs of perlbench (on some CPUs).  Apart from this
case I have not noticed any regressions, at least not so far.

Gimple call argument flags can tell if an argument is unused (and then
we do not need to generate any statements for it) or if it is not
written to and then we do not need to generate statements loading
replacements from the original aggregate after the call statement.
Unfortunately, we cannot symmetrically use flags that an aggregate is
not read because to avoid re-constructing the aggregate before the
call because flags don't tell which what parts of aggregates were not
written to, so we load all replacements, and so all need to have the
correct value before the call.

The patch passes bootstrap, lto-bootstrap and profiled-lto-bootstrap on
x86_64-linux and a very similar patch has also passed bootstrap and
testing on Aarch64-linux and ppc64le-linux (I'm re-running both on these
two architectures but as I'm sending this).  OK for master?

Thanks,

Martin


gcc/ChangeLog:

2023-11-16  Martin Jambor  <mjambor@suse.cz>

	PR middle-end/109849
	* tree-sra.cc (passed_by_ref_in_call): New.
	(sra_initialize): Allocate passed_by_ref_in_call.
	(sra_deinitialize): Free passed_by_ref_in_call.
	(create_access): Add decl pool candidates only if they are not
	already	candidates.
	(build_access_from_expr_1): Bail out on ADDR_EXPRs.
	(build_access_from_call_arg): New function.
	(asm_visit_addr): Rename to scan_visit_addr, change the
	disqualification dump message.
	(scan_function): Check taken addresses for all non-call statements,
	including phi nodes.  Process all call arguments, including the static
	chain, build_access_from_call_arg.
	(maybe_add_sra_candidate): Relax need_to_live_in_memory check to allow
	non-escaped local variables.
	(sort_and_splice_var_accesses): Disallow smaller-than-precision
	replacements for aggregates passed by reference to functions.
	(sra_modify_expr): Use a separate stmt iterator for adding satements
	before the processed statement and after it.
	(sra_modify_call_arg): New function.
	(sra_modify_assign): Adjust calls to sra_modify_expr.
	(sra_modify_function_body): Likewise, use sra_modify_call_arg to
	process call arguments, including the static chain.

gcc/testsuite/ChangeLog:

2023-11-03  Martin Jambor  <mjambor@suse.cz>

	PR middle-end/109849
	* g++.dg/tree-ssa/pr109849.C: New test.
	* gfortran.dg/pr43984.f90: Added -fno-tree-sra to dg-options.
---
 gcc/testsuite/g++.dg/tree-ssa/pr109849.C |  31 +++
 gcc/testsuite/gfortran.dg/pr43984.f90    |   2 +-
 gcc/tree-sra.cc                          | 244 ++++++++++++++++++-----
 3 files changed, 231 insertions(+), 46 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr109849.C

Comments

Richard Biener Nov. 17, 2023, 7:50 a.m. UTC | #1
On Thu, 16 Nov 2023, Martin Jambor wrote:

> Hello,
> 
> PR109849 shows that a loop that heavily pushes and pops from a stack
> implemented by a C++ std::vec results in slow code, mainly because the
> vector structure is not split by SRA and so we end up in many loads
> and stores into it.  This is because it is passed by reference
> to (re)allocation methods and so needs to live in memory, even though
> it does not escape from them and so we could SRA it if we
> re-constructed it before the call and then separated it to distinct
> replacements afterwards.
> 
> This patch does exactly that, first relaxing the selection of
> candidates to also include those which are addressable but do not
> escape and then adding code to deal with the calls.  The
> micro-benchmark that is also the (scan-dump) testcase in this patch
> runs twice as fast with it than with current trunk.  Honza measured
> its effect on the libjxl benchmark and it almost closes the
> performance gap between Clang and GCC while not requiring excessive
> inlining and thus code growth.
> 
> The patch disallows creation of replacements for such aggregates which
> are also accessed with a precision smaller than their size because I
> have observed that this led to excessive zero-extending of data
> leading to slow-downs of perlbench (on some CPUs).  Apart from this
> case I have not noticed any regressions, at least not so far.
> 
> Gimple call argument flags can tell if an argument is unused (and then
> we do not need to generate any statements for it) or if it is not
> written to and then we do not need to generate statements loading
> replacements from the original aggregate after the call statement.
> Unfortunately, we cannot symmetrically use flags that an aggregate is
> not read because to avoid re-constructing the aggregate before the
> call because flags don't tell which what parts of aggregates were not
> written to, so we load all replacements, and so all need to have the
> correct value before the call.
> 
> The patch passes bootstrap, lto-bootstrap and profiled-lto-bootstrap on
> x86_64-linux and a very similar patch has also passed bootstrap and
> testing on Aarch64-linux and ppc64le-linux (I'm re-running both on these
> two architectures but as I'm sending this).  OK for master?
> 
> Thanks,
> 
> Martin
> 
> 
> gcc/ChangeLog:
> 
> 2023-11-16  Martin Jambor  <mjambor@suse.cz>
> 
> 	PR middle-end/109849
> 	* tree-sra.cc (passed_by_ref_in_call): New.
> 	(sra_initialize): Allocate passed_by_ref_in_call.
> 	(sra_deinitialize): Free passed_by_ref_in_call.
> 	(create_access): Add decl pool candidates only if they are not
> 	already	candidates.
> 	(build_access_from_expr_1): Bail out on ADDR_EXPRs.
> 	(build_access_from_call_arg): New function.
> 	(asm_visit_addr): Rename to scan_visit_addr, change the
> 	disqualification dump message.
> 	(scan_function): Check taken addresses for all non-call statements,
> 	including phi nodes.  Process all call arguments, including the static
> 	chain, build_access_from_call_arg.
> 	(maybe_add_sra_candidate): Relax need_to_live_in_memory check to allow
> 	non-escaped local variables.
> 	(sort_and_splice_var_accesses): Disallow smaller-than-precision
> 	replacements for aggregates passed by reference to functions.
> 	(sra_modify_expr): Use a separate stmt iterator for adding satements
> 	before the processed statement and after it.
> 	(sra_modify_call_arg): New function.
> 	(sra_modify_assign): Adjust calls to sra_modify_expr.
> 	(sra_modify_function_body): Likewise, use sra_modify_call_arg to
> 	process call arguments, including the static chain.
> 
> gcc/testsuite/ChangeLog:
> 
> 2023-11-03  Martin Jambor  <mjambor@suse.cz>
> 
> 	PR middle-end/109849
> 	* g++.dg/tree-ssa/pr109849.C: New test.
> 	* gfortran.dg/pr43984.f90: Added -fno-tree-sra to dg-options.
> ---
>  gcc/testsuite/g++.dg/tree-ssa/pr109849.C |  31 +++
>  gcc/testsuite/gfortran.dg/pr43984.f90    |   2 +-
>  gcc/tree-sra.cc                          | 244 ++++++++++++++++++-----
>  3 files changed, 231 insertions(+), 46 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr109849.C
> 
> diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr109849.C b/gcc/testsuite/g++.dg/tree-ssa/pr109849.C
> new file mode 100644
> index 00000000000..cd348c0f590
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/tree-ssa/pr109849.C
> @@ -0,0 +1,31 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-sra" } */
> +
> +#include <vector>
> +typedef unsigned int uint32_t;
> +std::pair<uint32_t, uint32_t> pair;
> +void
> +test()
> +{
> +        std::vector<std::pair<uint32_t, uint32_t> > stack;
> +        stack.push_back (pair);
> +        while (!stack.empty()) {
> +                std::pair<uint32_t, uint32_t> cur = stack.back();
> +                stack.pop_back();
> +                if (!cur.first)
> +                {
> +                        cur.second++;
> +                        stack.push_back (cur);
> +                }
> +                if (cur.second > 10000)
> +                        break;
> +        }
> +}
> +int
> +main()
> +{
> +        for (int i = 0; i < 10000; i++)
> +          test();
> +}
> +
> +/* { dg-final { scan-tree-dump "Created a replacement for stack offset" "sra"} } */
> diff --git a/gcc/testsuite/gfortran.dg/pr43984.f90 b/gcc/testsuite/gfortran.dg/pr43984.f90
> index 130d114462c..dce26b0ef3b 100644
> --- a/gcc/testsuite/gfortran.dg/pr43984.f90
> +++ b/gcc/testsuite/gfortran.dg/pr43984.f90
> @@ -1,5 +1,5 @@
>  ! { dg-do compile }
> -! { dg-options "-O2 -fno-tree-dominator-opts -fdump-tree-pre" }
> +! { dg-options "-O2 -fno-tree-dominator-opts -fdump-tree-pre -fno-tree-sra" }
>  module test
>  
>     type shell1quartet_type
> diff --git a/gcc/tree-sra.cc b/gcc/tree-sra.cc
> index b985dee6964..676931d5383 100644
> --- a/gcc/tree-sra.cc
> +++ b/gcc/tree-sra.cc
> @@ -337,6 +337,9 @@ static bitmap should_scalarize_away_bitmap, cannot_scalarize_away_bitmap;
>     because this would produce non-constant expressions (e.g. Ada).  */
>  static bitmap disqualified_constants;
>  
> +/* Bitmap of candidates which are passed by reference in call arguments.  */
> +static bitmap passed_by_ref_in_call;
> +
>  /* Obstack for creation of fancy names.  */
>  static struct obstack name_obstack;
>  
> @@ -717,6 +720,7 @@ sra_initialize (void)
>    should_scalarize_away_bitmap = BITMAP_ALLOC (NULL);
>    cannot_scalarize_away_bitmap = BITMAP_ALLOC (NULL);
>    disqualified_constants = BITMAP_ALLOC (NULL);
> +  passed_by_ref_in_call = BITMAP_ALLOC (NULL);
>    gcc_obstack_init (&name_obstack);
>    base_access_vec = new hash_map<tree, auto_vec<access_p> >;
>    memset (&sra_stats, 0, sizeof (sra_stats));
> @@ -733,6 +737,7 @@ sra_deinitialize (void)
>    BITMAP_FREE (should_scalarize_away_bitmap);
>    BITMAP_FREE (cannot_scalarize_away_bitmap);
>    BITMAP_FREE (disqualified_constants);
> +  BITMAP_FREE (passed_by_ref_in_call);
>    access_pool.release ();
>    assign_link_pool.release ();
>    obstack_free (&name_obstack, NULL);
> @@ -905,9 +910,9 @@ create_access (tree expr, gimple *stmt, bool write)
>  				  &reverse);
>  
>    /* For constant-pool entries, check we can substitute the constant value.  */
> -  if (constant_decl_p (base))
> +  if (constant_decl_p (base)
> +      && !bitmap_bit_p (disqualified_constants, DECL_UID (base)))
>      {
> -      gcc_assert (!bitmap_bit_p (disqualified_constants, DECL_UID (base)));
>        if (expr != base
>  	  && !is_gimple_reg_type (TREE_TYPE (expr))
>  	  && dump_file && (dump_flags & TDF_DETAILS))
> @@ -1135,6 +1140,17 @@ sra_handled_bf_read_p (tree expr)
>  static struct access *
>  build_access_from_expr_1 (tree expr, gimple *stmt, bool write)
>  {
> +  /* We only allow ADDR_EXPRs in arguments of function calls and those must
> +     have been dealt with in build_access_from_call_arg.  Any other address
> +     taking should have been caught by scan_visit_addr.   */
> +  if (TREE_CODE (expr) == ADDR_EXPR)
> +    {
> +      tree base = get_base_address (TREE_OPERAND (expr, 0));
> +      gcc_assert (!DECL_P (base)
> +		  || !bitmap_bit_p (candidate_bitmap, DECL_UID (base)));
> +      return NULL;
> +    }
> +
>    struct access *ret = NULL;
>    bool partial_ref;
>  
> @@ -1223,6 +1239,40 @@ build_access_from_expr (tree expr, gimple *stmt, bool write)
>    return false;
>  }
>  
> +/* Scan expression EXPR which is an argument of a call and create access
> +   structures for all accesses to candidates for scalarization.  Return true if
> +   any access has been inserted.  STMT must be the statement from which the
> +   expression is taken.  */
> +
> +static bool
> +build_access_from_call_arg (tree expr, gimple *stmt)
> +{
> +  if (TREE_CODE (expr) == ADDR_EXPR)
> +    {
> +      tree base = get_base_address (TREE_OPERAND (expr, 0));
> +      bool read =  build_access_from_expr (base, stmt, false);
> +      bool write =  build_access_from_expr (base, stmt, true);
> +      if (read || write)
> +	{
> +	  if (dump_file && (dump_flags & TDF_DETAILS))
> +	    {
> +	      fprintf (dump_file, "Allowed ADDR_EXPR of ");
> +	      print_generic_expr (dump_file, base);
> +	      fprintf (dump_file, " because of ");
> +	      print_gimple_stmt (dump_file, stmt, 0);
> +	      fprintf (dump_file, "\n");
> +	    }
> +	  bitmap_set_bit (passed_by_ref_in_call, DECL_UID (base));
> +	  return true;
> +	}
> +      else
> +	return false;
> +    }
> +
> +  return build_access_from_expr (expr, stmt, false);
> +}
> +
> +
>  /* Return the single non-EH successor edge of BB or NULL if there is none or
>     more than one.  */
>  
> @@ -1364,16 +1414,18 @@ build_accesses_from_assign (gimple *stmt)
>    return lacc || racc;
>  }
>  
> -/* Callback of walk_stmt_load_store_addr_ops visit_addr used to determine
> -   GIMPLE_ASM operands with memory constrains which cannot be scalarized.  */
> +/* Callback of walk_stmt_load_store_addr_ops visit_addr used to detect taking
> +   addresses of candidates a places which are not call arguments.  Such
> +   candidates are disqalified from SRA.  This also applies to GIMPLE_ASM
> +   operands with memory constrains which cannot be scalarized.  */
>  
>  static bool
> -asm_visit_addr (gimple *, tree op, tree, void *)
> +scan_visit_addr (gimple *, tree op, tree, void *)
>  {
>    op = get_base_address (op);
>    if (op
>        && DECL_P (op))
> -    disqualify_candidate (op, "Non-scalarizable GIMPLE_ASM operand.");
> +    disqualify_candidate (op, "Address taken in a non-call-argument context.");
>  
>    return false;
>  }
> @@ -1390,12 +1442,20 @@ scan_function (void)
>    FOR_EACH_BB_FN (bb, cfun)
>      {
>        gimple_stmt_iterator gsi;
> +      for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> +	walk_stmt_load_store_addr_ops (gsi_stmt (gsi), NULL, NULL, NULL,
> +				       scan_visit_addr);
> +
>        for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>  	{
>  	  gimple *stmt = gsi_stmt (gsi);
>  	  tree t;
>  	  unsigned i;
>  
> +	  if (gimple_code (stmt) != GIMPLE_CALL)
> +	    walk_stmt_load_store_addr_ops (stmt, NULL, NULL, NULL,
> +					   scan_visit_addr);
> +
>  	  switch (gimple_code (stmt))
>  	    {
>  	    case GIMPLE_RETURN:
> @@ -1410,8 +1470,11 @@ scan_function (void)
>  
>  	    case GIMPLE_CALL:
>  	      for (i = 0; i < gimple_call_num_args (stmt); i++)
> -		ret |= build_access_from_expr (gimple_call_arg (stmt, i),
> -					       stmt, false);
> +		ret |= build_access_from_call_arg (gimple_call_arg (stmt, i),
> +						   stmt);
> +	      if (gimple_call_chain(stmt))
> +		ret |= build_access_from_call_arg (gimple_call_chain(stmt),
> +						   stmt);
>  
>  	      t = gimple_call_lhs (stmt);
>  	      if (t && !disqualify_if_bad_bb_terminating_stmt (stmt, t, NULL))
> @@ -1428,8 +1491,6 @@ scan_function (void)
>  	    case GIMPLE_ASM:
>  	      {
>  		gasm *asm_stmt = as_a <gasm *> (stmt);
> -		walk_stmt_load_store_addr_ops (asm_stmt, NULL, NULL, NULL,
> -					       asm_visit_addr);
>  		for (i = 0; i < gimple_asm_ninputs (asm_stmt); i++)
>  		  {
>  		    t = TREE_VALUE (gimple_asm_input_op (asm_stmt, i));
> @@ -1920,10 +1981,19 @@ maybe_add_sra_candidate (tree var)
>        reject (var, "not aggregate");
>        return false;
>      }
> -  /* Allow constant-pool entries that "need to live in memory".  */
> -  if (needs_to_live_in_memory (var) && !constant_decl_p (var))
> +
> +  if ((is_global_var (var)
> +       /* There are cases where non-addressable variables fail the
> +	  pt_solutions_check test, e.g in gcc.dg/uninit-40.c. */
> +       || (TREE_ADDRESSABLE (var)
> +	   && pt_solution_includes (&cfun->gimple_df->escaped, var))

so it seems this is the "correctness" test rather than using
the call argument flags?  I'll note that this may be overly
conservative.

For the call handling above I wondered about return-slot-opt calls
where the address of the LHS escapes to the call - if the points-to
check is the correctness check that should still work out of course
(but subject to PR109945).

> +       || (TREE_CODE (var) == RESULT_DECL
> +	   && !DECL_BY_REFERENCE (var)
> +	   && aggregate_value_p (var, current_function_decl)))
> +      /* Allow constant-pool entries that "need to live in memory".  */
> +      && !constant_decl_p (var))
>      {
> -      reject (var, "needs to live in memory");
> +      reject (var, "needs to live in memory and escapes or global");
>        return false;
>      }
>    if (TREE_THIS_VOLATILE (var))
> @@ -2122,6 +2192,21 @@ sort_and_splice_var_accesses (tree var)
>  	gcc_assert (access->offset >= low
>  		    && access->offset + access->size <= high);
>  
> +      if (INTEGRAL_TYPE_P (access->type)
> +	  && TYPE_PRECISION (access->type) != access->size
> +	  && bitmap_bit_p (passed_by_ref_in_call, DECL_UID (access->base)))
> +	{
> +	  if (dump_file && (dump_flags & TDF_DETAILS))
> +	    {
> +	      fprintf (dump_file, "Won't scalarize ");
> +	      print_generic_expr (dump_file, access->base);
> +	      fprintf (dump_file, "(%d), it is passed by reference to a call "
> +		       "and there are accesses with presicion not covering "
> +		       "their type size.", DECL_UID (access->base));
> +	    }
> +	  return NULL;
> +	}

I think this warrants a comment that this is to avoid code pessimization
rather than a correctness issue.

> +
>        grp_same_access_path = path_comparable_for_same_access (access->expr);
>  
>        j = i + 1;
> @@ -3774,12 +3859,18 @@ get_access_for_expr (tree expr)
>  
>  /* Replace the expression EXPR with a scalar replacement if there is one and
>     generate other statements to do type conversion or subtree copying if
> -   necessary.  GSI is used to place newly created statements, WRITE is true if
> -   the expression is being written to (it is on a LHS of a statement or output
> -   in an assembly statement).  */
> +   necessary.  WRITE is true if the expression is being written to (it is on a
> +   LHS of a statement or output in an assembly statement).  STMT_GSI is used to
> +   place newly created statements before the processed statement, REFRESH_GSI
> +   is used to place them afterwards - unless the processed statement must end a
> +   BB in which case it is placed on the outgoing non-EH edge.  REFRESH_GSI and
> +   is then used to continue iteration over the BB.  If sra_modify_expr is
> +   called only once with WRITE equal to true on a given statement, both
> +   iterator parameters can point to the same one.  */
>  
>  static bool
> -sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write)
> +sra_modify_expr (tree *expr, bool write, gimple_stmt_iterator *stmt_gsi,
> +		 gimple_stmt_iterator *refresh_gsi)
>  {
>    location_t loc;
>    struct access *access;
> @@ -3806,12 +3897,12 @@ sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write)
>    type = TREE_TYPE (*expr);
>    orig_expr = *expr;
>  
> -  loc = gimple_location (gsi_stmt (*gsi));
> +  loc = gimple_location (gsi_stmt (*stmt_gsi));
>    gimple_stmt_iterator alt_gsi = gsi_none ();
> -  if (write && stmt_ends_bb_p (gsi_stmt (*gsi)))
> +  if (write && stmt_ends_bb_p (gsi_stmt (*stmt_gsi)))
>      {
> -      alt_gsi = gsi_start_edge (single_non_eh_succ (gsi_bb (*gsi)));
> -      gsi = &alt_gsi;
> +      alt_gsi = gsi_start_edge (single_non_eh_succ (gsi_bb (*stmt_gsi)));
> +      refresh_gsi = &alt_gsi;
>      }
>  
>    if (access->grp_to_be_replaced)
> @@ -3831,7 +3922,8 @@ sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write)
>  	{
>  	  tree ref;
>  
> -	  ref = build_ref_for_model (loc, orig_expr, 0, access, gsi, false);
> +	  ref = build_ref_for_model (loc, orig_expr, 0, access, stmt_gsi,
> +				     false);
>  
>  	  if (partial_cplx_access)
>  	    {
> @@ -3847,7 +3939,7 @@ sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write)
>  		  tree tmp = make_ssa_name (type);
>  		  gassign *stmt = gimple_build_assign (tmp, t);
>  		  /* This is always a read. */
> -		  gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
> +		  gsi_insert_before (stmt_gsi, stmt, GSI_SAME_STMT);
>  		  t = tmp;
>  		}
>  	      *expr = t;
> @@ -3857,22 +3949,23 @@ sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write)
>  	      gassign *stmt;
>  
>  	      if (access->grp_partial_lhs)
> -		ref = force_gimple_operand_gsi (gsi, ref, true, NULL_TREE,
> -						 false, GSI_NEW_STMT);
> +		ref = force_gimple_operand_gsi (refresh_gsi, ref, true,
> +						NULL_TREE, false, GSI_NEW_STMT);
>  	      stmt = gimple_build_assign (repl, ref);
>  	      gimple_set_location (stmt, loc);
> -	      gsi_insert_after (gsi, stmt, GSI_NEW_STMT);
> +	      gsi_insert_after (refresh_gsi, stmt, GSI_NEW_STMT);
>  	    }
>  	  else
>  	    {
>  	      gassign *stmt;
>  
>  	      if (access->grp_partial_lhs)
> -		repl = force_gimple_operand_gsi (gsi, repl, true, NULL_TREE,
> -						 true, GSI_SAME_STMT);
> +		repl = force_gimple_operand_gsi (stmt_gsi, repl, true,
> +						 NULL_TREE, true,
> +						 GSI_SAME_STMT);
>  	      stmt = gimple_build_assign (ref, repl);
>  	      gimple_set_location (stmt, loc);
> -	      gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
> +	      gsi_insert_before (stmt_gsi, stmt, GSI_SAME_STMT);
>  	    }
>  	}
>        else
> @@ -3899,8 +3992,8 @@ sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write)
>      {
>        gdebug *ds = gimple_build_debug_bind (get_access_replacement (access),
>  					    NULL_TREE,
> -					    gsi_stmt (*gsi));
> -      gsi_insert_after (gsi, ds, GSI_NEW_STMT);
> +					    gsi_stmt (*stmt_gsi));
> +      gsi_insert_after (stmt_gsi, ds, GSI_NEW_STMT);
>      }
>  
>    if (access->first_child && !TREE_READONLY (access->base))
> @@ -3918,8 +4011,58 @@ sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write)
>  	start_offset = chunk_size = 0;
>  
>        generate_subtree_copies (access->first_child, orig_expr, access->offset,
> -			       start_offset, chunk_size, gsi, write, write,
> -			       loc);
> +			       start_offset, chunk_size,
> +			       write ? refresh_gsi : stmt_gsi,
> +			       write, write, loc);
> +    }
> +  return true;
> +}
> +
> +/* If EXPR, which must be a call argument, is an ADDR_EXPR, generate writes and
> +   reads from its base before and after the call statement given in CALL_GSI
> +   and return true if any copying took place.  Otherwise call sra_modify_expr
> +   on EXPR and return its value.  FLAGS is what the gimple_call_arg_flags
> +   return for the given parameter.  */
> +
> +static bool
> +sra_modify_call_arg (tree *expr, gimple_stmt_iterator *call_gsi,
> +		     gimple_stmt_iterator *refresh_gsi, int flags)
> +{
> +  if (TREE_CODE (*expr) != ADDR_EXPR)
> +    return sra_modify_expr (expr, false, call_gsi, refresh_gsi);
> +
> +  if (flags & EAF_UNUSED)
> +    return false;

So in this case we leave around the address of the DECL but with
"wrong" content.  That might lead to an odd debug experience, but well.
For optimization wouldn't it be better to substitute 'nullptr' so
we can elide the DECL (not allocate stack space for it)?  I think
EAF_UNUSED means we're not inspecting the pointer value either.

> +
> +  tree base = get_base_address (TREE_OPERAND (*expr, 0));
> +  if (!DECL_P (base))
> +    return false;
> +  struct access *access = get_access_for_expr (base);
> +  if (!access)
> +    return false;
> +
> +  gimple *stmt = gsi_stmt (*call_gsi);
> +  location_t loc = gimple_location (stmt);
> +  generate_subtree_copies (access, base, 0, 0, 0, call_gsi, false, false,
> +			   loc);
> +
> +  if ((flags & (EAF_NO_DIRECT_CLOBBER | EAF_NO_INDIRECT_CLOBBER))
> +      == (EAF_NO_DIRECT_CLOBBER | EAF_NO_INDIRECT_CLOBBER))
> +    return true;

Why both EAF_NO_DIRECT_CLOBBER and EAF_NO_INDIRECT_CLOBBER?  I think
EAF_NO_DIRECT_CLOBBER is enough unless you think that you didn't rule
out the decl to point to itself by the escaping means?  Warrants a
explaining comment.

> +
> +  if (!stmt_ends_bb_p (stmt))
> +    generate_subtree_copies (access, base, 0, 0, 0, refresh_gsi, true,
> +			     true, loc);
> +  else
> +    {
> +      edge e;
> +      edge_iterator ei;
> +      FOR_EACH_EDGE (e, ei, gsi_bb (*call_gsi)->succs)
> +	{
> +	  gimple_stmt_iterator alt_gsi = gsi_start_edge (e);
> +	  generate_subtree_copies (access, base, 0, 0, 0, &alt_gsi, true,
> +				   true, loc);

note for non-return-slot-opt it's important to avoid materializing
writes to the LHS on the exceptional path.

Also note you can't insert on an abnormal edge so I hope you
disqualify decls that would require this kind of call processing?
IMHO for cost reasons it might be a good idea to also avoid the
exeptional path? (but maybe it's important for the standard library
cases).

Otherwise the patch looks OK.

I wonder if it's possible to add some basic testcases not involving
the standard library though.

Thanks,
Richard.

> +	}
>      }
>    return true;
>  }
> @@ -4279,9 +4422,9 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi)
>        || TREE_CODE (lhs) == BIT_FIELD_REF)
>      {
>        modify_this_stmt = sra_modify_expr (gimple_assign_rhs1_ptr (stmt),
> -					  gsi, false);
> +					  false, gsi, gsi);
>        modify_this_stmt |= sra_modify_expr (gimple_assign_lhs_ptr (stmt),
> -					   gsi, true);
> +					   true, gsi, gsi);
>        return modify_this_stmt ? SRA_AM_MODIFIED : SRA_AM_NONE;
>      }
>  
> @@ -4603,7 +4746,7 @@ sra_modify_function_body (void)
>  	    case GIMPLE_RETURN:
>  	      t = gimple_return_retval_ptr (as_a <greturn *> (stmt));
>  	      if (*t != NULL_TREE)
> -		modified |= sra_modify_expr (t, &gsi, false);
> +		modified |= sra_modify_expr (t, false, &gsi, &gsi);
>  	      break;
>  
>  	    case GIMPLE_ASSIGN:
> @@ -4622,33 +4765,44 @@ sra_modify_function_body (void)
>  		}
>  	      else
>  		{
> +		  gcall *call = as_a <gcall *> (stmt);
> +		  gimple_stmt_iterator call_gsi = gsi;
> +
>  		  /* Operands must be processed before the lhs.  */
> -		  for (i = 0; i < gimple_call_num_args (stmt); i++)
> +		  for (i = 0; i < gimple_call_num_args (call); i++)
>  		    {
> -		      t = gimple_call_arg_ptr (stmt, i);
> -		      modified |= sra_modify_expr (t, &gsi, false);
> +		      int flags = gimple_call_arg_flags (call, i);
> +		      t = gimple_call_arg_ptr (call, i);
> +		      modified |= sra_modify_call_arg (t, &call_gsi, &gsi, flags);
>  		    }
> -
> -	      	  if (gimple_call_lhs (stmt))
> +		  if (gimple_call_chain (call))
> +		    {
> +		      t = gimple_call_chain_ptr (call);
> +		      int flags = gimple_call_static_chain_flags (call);
> +		      modified |= sra_modify_call_arg (t, &call_gsi, &gsi,
> +						       flags);
> +		    }
> +		  if (gimple_call_lhs (call))
>  		    {
> -		      t = gimple_call_lhs_ptr (stmt);
> -		      modified |= sra_modify_expr (t, &gsi, true);
> +		      t = gimple_call_lhs_ptr (call);
> +		      modified |= sra_modify_expr (t, true, &call_gsi, &gsi);
>  		    }
>  		}
>  	      break;
>  
>  	    case GIMPLE_ASM:
>  	      {
> +		gimple_stmt_iterator stmt_gsi = gsi;
>  		gasm *asm_stmt = as_a <gasm *> (stmt);
>  		for (i = 0; i < gimple_asm_ninputs (asm_stmt); i++)
>  		  {
>  		    t = &TREE_VALUE (gimple_asm_input_op (asm_stmt, i));
> -		    modified |= sra_modify_expr (t, &gsi, false);
> +		    modified |= sra_modify_expr (t, false, &stmt_gsi, &gsi);
>  		  }
>  		for (i = 0; i < gimple_asm_noutputs (asm_stmt); i++)
>  		  {
>  		    t = &TREE_VALUE (gimple_asm_output_op (asm_stmt, i));
> -		    modified |= sra_modify_expr (t, &gsi, true);
> +		    modified |= sra_modify_expr (t, true, &stmt_gsi, &gsi);
>  		  }
>  	      }
>  	      break;
>
Martin Jambor Nov. 24, 2023, 1:48 p.m. UTC | #2
Hello,

thanks a lot for your review.

On Fri, Nov 17 2023, Richard Biener wrote:
> On Thu, 16 Nov 2023, Martin Jambor wrote:
>
>> Hello,
>> 
>> PR109849 shows that a loop that heavily pushes and pops from a stack
>> implemented by a C++ std::vec results in slow code, mainly because the
>> vector structure is not split by SRA and so we end up in many loads
>> and stores into it.  This is because it is passed by reference
>> to (re)allocation methods and so needs to live in memory, even though
>> it does not escape from them and so we could SRA it if we
>> re-constructed it before the call and then separated it to distinct
>> replacements afterwards.
>> 
>> This patch does exactly that, first relaxing the selection of
>> candidates to also include those which are addressable but do not
>> escape and then adding code to deal with the calls.  The
>> micro-benchmark that is also the (scan-dump) testcase in this patch
>> runs twice as fast with it than with current trunk.  Honza measured
>> its effect on the libjxl benchmark and it almost closes the
>> performance gap between Clang and GCC while not requiring excessive
>> inlining and thus code growth.
>> 
>> The patch disallows creation of replacements for such aggregates which
>> are also accessed with a precision smaller than their size because I
>> have observed that this led to excessive zero-extending of data
>> leading to slow-downs of perlbench (on some CPUs).  Apart from this
>> case I have not noticed any regressions, at least not so far.
>> 
>> Gimple call argument flags can tell if an argument is unused (and then
>> we do not need to generate any statements for it) or if it is not
>> written to and then we do not need to generate statements loading
>> replacements from the original aggregate after the call statement.
>> Unfortunately, we cannot symmetrically use flags that an aggregate is
>> not read because to avoid re-constructing the aggregate before the
>> call because flags don't tell which what parts of aggregates were not
>> written to, so we load all replacements, and so all need to have the
>> correct value before the call.
>> 
>> The patch passes bootstrap, lto-bootstrap and profiled-lto-bootstrap on
>> x86_64-linux and a very similar patch has also passed bootstrap and
>> testing on Aarch64-linux and ppc64le-linux (I'm re-running both on these
>> two architectures but as I'm sending this).  OK for master?
>> 
>> Thanks,
>> 
>> Martin
>> 

[...]

>> @@ -1920,10 +1981,19 @@ maybe_add_sra_candidate (tree var)
>>        reject (var, "not aggregate");
>>        return false;
>>      }
>> -  /* Allow constant-pool entries that "need to live in memory".  */
>> -  if (needs_to_live_in_memory (var) && !constant_decl_p (var))
>> +
>> +  if ((is_global_var (var)
>> +       /* There are cases where non-addressable variables fail the
>> +	  pt_solutions_check test, e.g in gcc.dg/uninit-40.c. */
>> +       || (TREE_ADDRESSABLE (var)
>> +	   && pt_solution_includes (&cfun->gimple_df->escaped, var))
>
> so it seems this is the "correctness" test rather than using
> the call argument flags?  I'll note that this may be overly
> conservative.

It is but with ipa-modref it seems to work fairly well.

>
> For the call handling above I wondered about return-slot-opt calls
> where the address of the LHS escapes to the call - if the points-to
> check is the correctness check that should still work out of course
> (but subject to PR109945).

Hm, I wonder what the implications of that PR is for SRA, but it is a
different issue, this patch does not touch handling LHSs.

>
>> +       || (TREE_CODE (var) == RESULT_DECL
>> +	   && !DECL_BY_REFERENCE (var)
>> +	   && aggregate_value_p (var, current_function_decl)))
>> +      /* Allow constant-pool entries that "need to live in memory".  */
>> +      && !constant_decl_p (var))
>>      {
>> -      reject (var, "needs to live in memory");
>> +      reject (var, "needs to live in memory and escapes or global");
>>        return false;
>>      }
>>    if (TREE_THIS_VOLATILE (var))
>> @@ -2122,6 +2192,21 @@ sort_and_splice_var_accesses (tree var)
>>  	gcc_assert (access->offset >= low
>>  		    && access->offset + access->size <= high);
>>  
>> +      if (INTEGRAL_TYPE_P (access->type)
>> +	  && TYPE_PRECISION (access->type) != access->size
>> +	  && bitmap_bit_p (passed_by_ref_in_call, DECL_UID (access->base)))
>> +	{
>> +	  if (dump_file && (dump_flags & TDF_DETAILS))
>> +	    {
>> +	      fprintf (dump_file, "Won't scalarize ");
>> +	      print_generic_expr (dump_file, access->base);
>> +	      fprintf (dump_file, "(%d), it is passed by reference to a call "
>> +		       "and there are accesses with presicion not covering "
>> +		       "their type size.", DECL_UID (access->base));
>> +	    }
>> +	  return NULL;
>> +	}
>
> I think this warrants a comment that this is to avoid code pessimization
> rather than a correctness issue.

Done.

>
>> +
>>        grp_same_access_path = path_comparable_for_same_access (access->expr);
>>  
>>        j = i + 1;
>> @@ -3774,12 +3859,18 @@ get_access_for_expr (tree expr)
>>  
>>  /* Replace the expression EXPR with a scalar replacement if there is one and
>>     generate other statements to do type conversion or subtree copying if
>> -   necessary.  GSI is used to place newly created statements, WRITE is true if
>> -   the expression is being written to (it is on a LHS of a statement or output
>> -   in an assembly statement).  */
>> +   necessary.  WRITE is true if the expression is being written to (it is on a
>> +   LHS of a statement or output in an assembly statement).  STMT_GSI is used to
>> +   place newly created statements before the processed statement, REFRESH_GSI
>> +   is used to place them afterwards - unless the processed statement must end a
>> +   BB in which case it is placed on the outgoing non-EH edge.  REFRESH_GSI and
>> +   is then used to continue iteration over the BB.  If sra_modify_expr is
>> +   called only once with WRITE equal to true on a given statement, both
>> +   iterator parameters can point to the same one.  */
>>  
>>  static bool
>> -sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write)
>> +sra_modify_expr (tree *expr, bool write, gimple_stmt_iterator *stmt_gsi,
>> +		 gimple_stmt_iterator *refresh_gsi)
>>  {
>>    location_t loc;
>>    struct access *access;
>> @@ -3806,12 +3897,12 @@ sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write)
>>    type = TREE_TYPE (*expr);
>>    orig_expr = *expr;
>>  
>> -  loc = gimple_location (gsi_stmt (*gsi));
>> +  loc = gimple_location (gsi_stmt (*stmt_gsi));
>>    gimple_stmt_iterator alt_gsi = gsi_none ();
>> -  if (write && stmt_ends_bb_p (gsi_stmt (*gsi)))
>> +  if (write && stmt_ends_bb_p (gsi_stmt (*stmt_gsi)))
>>      {
>> -      alt_gsi = gsi_start_edge (single_non_eh_succ (gsi_bb (*gsi)));
>> -      gsi = &alt_gsi;
>> +      alt_gsi = gsi_start_edge (single_non_eh_succ (gsi_bb (*stmt_gsi)));
>> +      refresh_gsi = &alt_gsi;
>>      }
>>  
>>    if (access->grp_to_be_replaced)
>> @@ -3831,7 +3922,8 @@ sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write)
>>  	{
>>  	  tree ref;
>>  
>> -	  ref = build_ref_for_model (loc, orig_expr, 0, access, gsi, false);
>> +	  ref = build_ref_for_model (loc, orig_expr, 0, access, stmt_gsi,
>> +				     false);
>>  
>>  	  if (partial_cplx_access)
>>  	    {
>> @@ -3847,7 +3939,7 @@ sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write)
>>  		  tree tmp = make_ssa_name (type);
>>  		  gassign *stmt = gimple_build_assign (tmp, t);
>>  		  /* This is always a read. */
>> -		  gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
>> +		  gsi_insert_before (stmt_gsi, stmt, GSI_SAME_STMT);
>>  		  t = tmp;
>>  		}
>>  	      *expr = t;
>> @@ -3857,22 +3949,23 @@ sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write)
>>  	      gassign *stmt;
>>  
>>  	      if (access->grp_partial_lhs)
>> -		ref = force_gimple_operand_gsi (gsi, ref, true, NULL_TREE,
>> -						 false, GSI_NEW_STMT);
>> +		ref = force_gimple_operand_gsi (refresh_gsi, ref, true,
>> +						NULL_TREE, false, GSI_NEW_STMT);
>>  	      stmt = gimple_build_assign (repl, ref);
>>  	      gimple_set_location (stmt, loc);
>> -	      gsi_insert_after (gsi, stmt, GSI_NEW_STMT);
>> +	      gsi_insert_after (refresh_gsi, stmt, GSI_NEW_STMT);
>>  	    }
>>  	  else
>>  	    {
>>  	      gassign *stmt;
>>  
>>  	      if (access->grp_partial_lhs)
>> -		repl = force_gimple_operand_gsi (gsi, repl, true, NULL_TREE,
>> -						 true, GSI_SAME_STMT);
>> +		repl = force_gimple_operand_gsi (stmt_gsi, repl, true,
>> +						 NULL_TREE, true,
>> +						 GSI_SAME_STMT);
>>  	      stmt = gimple_build_assign (ref, repl);
>>  	      gimple_set_location (stmt, loc);
>> -	      gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
>> +	      gsi_insert_before (stmt_gsi, stmt, GSI_SAME_STMT);
>>  	    }
>>  	}
>>        else
>> @@ -3899,8 +3992,8 @@ sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write)
>>      {
>>        gdebug *ds = gimple_build_debug_bind (get_access_replacement (access),
>>  					    NULL_TREE,
>> -					    gsi_stmt (*gsi));
>> -      gsi_insert_after (gsi, ds, GSI_NEW_STMT);
>> +					    gsi_stmt (*stmt_gsi));
>> +      gsi_insert_after (stmt_gsi, ds, GSI_NEW_STMT);
>>      }
>>  
>>    if (access->first_child && !TREE_READONLY (access->base))
>> @@ -3918,8 +4011,58 @@ sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write)
>>  	start_offset = chunk_size = 0;
>>  
>>        generate_subtree_copies (access->first_child, orig_expr, access->offset,
>> -			       start_offset, chunk_size, gsi, write, write,
>> -			       loc);
>> +			       start_offset, chunk_size,
>> +			       write ? refresh_gsi : stmt_gsi,
>> +			       write, write, loc);
>> +    }
>> +  return true;
>> +}
>> +
>> +/* If EXPR, which must be a call argument, is an ADDR_EXPR, generate writes and
>> +   reads from its base before and after the call statement given in CALL_GSI
>> +   and return true if any copying took place.  Otherwise call sra_modify_expr
>> +   on EXPR and return its value.  FLAGS is what the gimple_call_arg_flags
>> +   return for the given parameter.  */
>> +
>> +static bool
>> +sra_modify_call_arg (tree *expr, gimple_stmt_iterator *call_gsi,
>> +		     gimple_stmt_iterator *refresh_gsi, int flags)
>> +{
>> +  if (TREE_CODE (*expr) != ADDR_EXPR)
>> +    return sra_modify_expr (expr, false, call_gsi, refresh_gsi);
>> +
>> +  if (flags & EAF_UNUSED)
>> +    return false;
>
> So in this case we leave around the address of the DECL but with
> "wrong" content.  That might lead to an odd debug experience, but well.
> For optimization wouldn't it be better to substitute 'nullptr' so
> we can elide the DECL (not allocate stack space for it)?  I think
> EAF_UNUSED means we're not inspecting the pointer value either.

Using EAF_UNUSED is a bit of an after-thought.  I hope that often
IPA-SRA juste removes such parameters but when the called function is
externally visible and yet not interposable, IPA-SRA may not touch it
but IPA-modref could derive its properties and so this could take an
effect.

I'll be happy to do prepare a patch passing nullptr in these cases as a
follow-up.

>
>> +
>> +  tree base = get_base_address (TREE_OPERAND (*expr, 0));
>> +  if (!DECL_P (base))
>> +    return false;
>> +  struct access *access = get_access_for_expr (base);
>> +  if (!access)
>> +    return false;
>> +
>> +  gimple *stmt = gsi_stmt (*call_gsi);
>> +  location_t loc = gimple_location (stmt);
>> +  generate_subtree_copies (access, base, 0, 0, 0, call_gsi, false, false,
>> +			   loc);
>> +
>> +  if ((flags & (EAF_NO_DIRECT_CLOBBER | EAF_NO_INDIRECT_CLOBBER))
>> +      == (EAF_NO_DIRECT_CLOBBER | EAF_NO_INDIRECT_CLOBBER))
>> +    return true;
>
> Why both EAF_NO_DIRECT_CLOBBER and EAF_NO_INDIRECT_CLOBBER?  I think
> EAF_NO_DIRECT_CLOBBER is enough unless you think that you didn't rule
> out the decl to point to itself by the escaping means?  Warrants a
> explaining comment.

That was a mistake, I've changed it to use just EAF_NO_DIRECT_CLOBBER.

>
>> +
>> +  if (!stmt_ends_bb_p (stmt))
>> +    generate_subtree_copies (access, base, 0, 0, 0, refresh_gsi, true,
>> +			     true, loc);
>> +  else
>> +    {
>> +      edge e;
>> +      edge_iterator ei;
>> +      FOR_EACH_EDGE (e, ei, gsi_bb (*call_gsi)->succs)
>> +	{
>> +	  gimple_stmt_iterator alt_gsi = gsi_start_edge (e);
>> +	  generate_subtree_copies (access, base, 0, 0, 0, &alt_gsi, true,
>> +				   true, loc);
>
> note for non-return-slot-opt it's important to avoid materializing
> writes to the LHS on the exceptional path.

Yes, but LHSs are processed elsewhere (even without this patch).

>
> Also note you can't insert on an abnormal edge so I hope you
> disqualify decls that would require this kind of call processing?
> IMHO for cost reasons it might be a good idea to also avoid the
> exeptional path? (but maybe it's important for the standard library
> cases).

This check was actually missing and sure enough I was able to write a
testcase leading to ICEs.

> I wonder if it's possible to add some basic testcases not involving
> the standard library though.

Fair enough.  I have added a C testcase and a run-time C++ testcase that
verifies things work as expected when exceptions are raised.  And a
testcase with (setjmp/longjmp) abnormal edges.

> Otherwise the patch looks OK.
>

Thank you.  The following has passed bootstrap and testing on
x86_64-linux.  I'll wait for LTO bootstrap and ppc64le-bootstrap to
finish and then plan to commit it.

Martin


Subject: [PATCH] sra: SRA of non-escaped aggregates passed by reference to calls

PR109849 shows that a loop that heavily pushes and pops from a stack
implemented by a C++ std::vec results in slow code, mainly because the
vector structure is not split by SRA and so we end up in many loads
and stores into it.  This is because it is passed by reference
to (re)allocation methods and so needs to live in memory, even though
it does not escape from them and so we could SRA it if we
re-constructed it before the call and then separated it to distinct
replacements afterwards.

This patch does exactly that, first relaxing the selection of
candidates to also include those which are addressable but do not
escape and then adding code to deal with the calls.  The
micro-benchmark that is also the (scan-dump) testcase in this patch
runs twice as fast with it than with current trunk.  Honza measured
its effect on the libjxl benchmark and it almost closes the
performance gap between Clang and GCC while not requiring excessive
inlining and thus code growth.

The patch disallows creation of replacements for such aggregates which
are also accessed with a precision smaller than their size because I
have observed that this led to excessive zero-extending of data
leading to slow-downs of perlbench (on some CPUs).  Apart from this
case I have not noticed any regressions, at least not so far.

Gimple call argument flags can tell if an argument is unused (and then
we do not need to generate any statements for it) or if it is not
written to and then we do not need to generate statements loading
replacements from the original aggregate after the call statement.
Unfortunately, we cannot symmetrically use flags that an aggregate is
not read because to avoid re-constructing the aggregate before the
call because flags don't tell which what parts of aggregates were not
written to, so we load all replacements, and so all need to have the
correct value before the call.

This version of the patch also takes care to avoid attempts to modify
abnormal edges, something which was missing in the previosu version.

gcc/ChangeLog:

2023-11-23  Martin Jambor  <mjambor@suse.cz>

	PR middle-end/109849
	* tree-sra.cc (passed_by_ref_in_call): New.
	(sra_initialize): Allocate passed_by_ref_in_call.
	(sra_deinitialize): Free passed_by_ref_in_call.
	(create_access): Add decl pool candidates only if they are not
	already	candidates.
	(build_access_from_expr_1): Bail out on ADDR_EXPRs.
	(build_access_from_call_arg): New function.
	(asm_visit_addr): Rename to scan_visit_addr, change the
	disqualification dump message.
	(scan_function): Check taken addresses for all non-call statements,
	including phi nodes.  Process all call arguments, including the static
	chain, build_access_from_call_arg.
	(maybe_add_sra_candidate): Relax need_to_live_in_memory check to allow
	non-escaped local variables.
	(sort_and_splice_var_accesses): Disallow smaller-than-precision
	replacements for aggregates passed by reference to functions.
	(sra_modify_expr): Use a separate stmt iterator for adding satements
	before the processed statement and after it.
	(enum out_edge_check): New type.
	(abnormal_edge_after_stmt_p): New function.
	(sra_modify_call_arg): New function.
	(sra_modify_assign): Adjust calls to sra_modify_expr.
	(sra_modify_function_body): Likewise, use sra_modify_call_arg to
	process call arguments, including the static chain.

gcc/testsuite/ChangeLog:

2023-11-23  Martin Jambor  <mjambor@suse.cz>

	PR middle-end/109849
	* g++.dg/tree-ssa/pr109849.C: New test.
	* g++.dg/tree-ssa/sra-eh-1.C: Likewise.
	* gcc.dg/tree-ssa/pr109849.c: Likewise.
	* gcc.dg/tree-ssa/sra-longjmp-1.c: Likewise.
	* gfortran.dg/pr43984.f90: Added -fno-tree-sra to dg-options.
---
 gcc/testsuite/g++.dg/tree-ssa/pr109849.C      |  31 ++
 gcc/testsuite/g++.dg/tree-ssa/sra-eh-1.C      | 187 ++++++++++++
 gcc/testsuite/gcc.dg/tree-ssa/pr109849.c      |  60 ++++
 gcc/testsuite/gcc.dg/tree-ssa/sra-longjmp-1.c |  87 ++++++
 gcc/testsuite/gfortran.dg/pr43984.f90         |   2 +-
 gcc/tree-sra.cc                               | 287 +++++++++++++++---
 6 files changed, 607 insertions(+), 47 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr109849.C
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/sra-eh-1.C
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr109849.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sra-longjmp-1.c

diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr109849.C b/gcc/testsuite/g++.dg/tree-ssa/pr109849.C
new file mode 100644
index 00000000000..cd348c0f590
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr109849.C
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sra" } */
+
+#include <vector>
+typedef unsigned int uint32_t;
+std::pair<uint32_t, uint32_t> pair;
+void
+test()
+{
+        std::vector<std::pair<uint32_t, uint32_t> > stack;
+        stack.push_back (pair);
+        while (!stack.empty()) {
+                std::pair<uint32_t, uint32_t> cur = stack.back();
+                stack.pop_back();
+                if (!cur.first)
+                {
+                        cur.second++;
+                        stack.push_back (cur);
+                }
+                if (cur.second > 10000)
+                        break;
+        }
+}
+int
+main()
+{
+        for (int i = 0; i < 10000; i++)
+          test();
+}
+
+/* { dg-final { scan-tree-dump "Created a replacement for stack offset" "sra"} } */
diff --git a/gcc/testsuite/g++.dg/tree-ssa/sra-eh-1.C b/gcc/testsuite/g++.dg/tree-ssa/sra-eh-1.C
new file mode 100644
index 00000000000..9c216a61475
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/sra-eh-1.C
@@ -0,0 +1,187 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fdump-tree-sra" } */
+
+struct E
+{
+  short c;
+  E (short param) : c(param) {};
+};
+
+volatile short vs = 0;
+long buf[16];
+long * volatile pbuf = (long *) &buf;
+
+static void __attribute__((noinline))
+unrelated_throwing (void)
+{
+  throw E (256);
+}
+
+struct S {
+  long  *b;
+  int c;
+  int s;
+};
+
+static void __attribute__((noinline))
+crazy_alloc_s (struct S *p)
+{
+  int s = p->c ? p->c * 2 : 16;
+
+  long *b = pbuf;
+  if (!b || s > 16)
+    {
+      p->s = -p->s;
+      throw E (127);
+    }
+
+  __builtin_memcpy (b, p->b, p->c);
+  p->b = b;
+  p->s = s;
+  pbuf = 0;
+  return;
+}
+
+long __attribute__((noipa))
+process (long v)
+{
+  return v + 1;
+}
+
+void __attribute__((noipa))
+check (int s, short c)
+{
+  if (vs < c)
+    vs = c;
+  if (c == 127)
+    {
+      if (s >= 0)
+	__builtin_abort ();
+    }
+  else
+    {
+      if (s != 0)
+	__builtin_abort ();
+    }
+  return;
+}
+
+
+void
+foo (void)
+{
+  struct S foo_stack;
+
+  foo_stack.c = 0;
+  crazy_alloc_s (&foo_stack);
+  foo_stack.b[0] = 1;
+  foo_stack.c = 1;
+
+  while (foo_stack.c)
+    {
+      long l = foo_stack.b[--foo_stack.c];
+
+      if (l > 0)
+	{
+	  for (int i = 0; i < 4; i++)
+	    {
+	      try
+		{
+		  if (foo_stack.s <= foo_stack.c + 1)
+		    crazy_alloc_s (&foo_stack);
+		}
+	      catch (E e)
+		{
+		  check (foo_stack.s, e.c);
+		  return;
+		}
+	      l = process (l);
+	      foo_stack.b[foo_stack.c++] = l;
+	    }
+	}
+    }
+  return;
+}
+
+
+volatile int vi;
+
+int __attribute__((noipa))
+save (int s)
+{
+  vi = s;
+  return 0;
+}
+
+int __attribute__((noipa))
+restore ()
+{
+  return vi;
+}
+
+
+void
+bar (void)
+{
+  struct S bar_stack;
+
+  bar_stack.c = 0;
+  crazy_alloc_s (&bar_stack);
+  bar_stack.b[0] = 1;
+  bar_stack.c = 1;
+
+  while (bar_stack.c)
+    {
+      long l = bar_stack.b[--bar_stack.c];
+
+      if (l > 0)
+	{
+	  for (int i = 0; i < 4; i++)
+	    {
+	      try
+		{
+		  if (i == 2)
+		    {
+		      bar_stack.s = save (bar_stack.s);
+		      unrelated_throwing ();
+		    }
+		  if (bar_stack.s <= bar_stack.c + 1)
+		    crazy_alloc_s (&bar_stack);
+		}
+	      catch (E e)
+		{
+		  check (bar_stack.s, e.c);
+		  if (e.c == 127)
+		    return;
+		  bar_stack.s = restore ();
+		  continue;
+		}
+	      l = process (l);
+	      bar_stack.b[bar_stack.c++] = l;
+	    }
+	}
+    }
+  return;
+}
+
+
+
+
+int main (int argc, char **argv)
+{
+  vs = 0;
+  pbuf = (long *) &buf;
+  foo ();
+  if (vs != 127)
+    __builtin_abort ();
+
+  vs = 0;
+  pbuf = (long *) &buf;
+  bar ();
+  if (vs != 256)
+    __builtin_abort ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump "Created a replacement for foo_stack offset" "sra"} } */
+/* { dg-final { scan-tree-dump "Created a replacement for bar_stack offset" "sra"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr109849.c b/gcc/testsuite/gcc.dg/tree-ssa/pr109849.c
new file mode 100644
index 00000000000..2d58e8daecf
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr109849.c
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sra" } */
+
+struct D
+{
+  long a;
+  short b;
+};
+
+struct S {
+  struct D *b;
+  int c;
+  int s;
+};
+
+struct D * __attribute__((malloc)) some_realloc_function (int c);
+
+static void __attribute__((noinline))
+realloc_s (struct S *p)
+{
+  int s = p->c ? p->c * 2 : 16;
+  p->b = some_realloc_function (s);
+  p->s = s;
+  return;
+}
+
+void modify (struct D *d);
+int test (struct D *d);
+
+static struct D gd;
+
+int
+foo (void)
+{
+  struct S stack;
+
+  stack.c = 1;
+  stack.s = 0;
+  realloc_s (&stack);
+  stack.b[0] = gd;
+  stack.c = 1;
+
+  while (stack.c)
+    {
+      struct D d = stack.b[--stack.c];
+      if (test (&d))
+	{
+	  for (int i = 0; i < 8; i++)
+	    {
+	      if (stack.s <= stack.c + 1)
+		realloc_s (&stack);
+	      modify(&d);
+	      stack.b[stack.c++] = d;
+	    }
+	}
+    }
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump "Created a replacement for stack offset" "sra"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sra-longjmp-1.c b/gcc/testsuite/gcc.dg/tree-ssa/sra-longjmp-1.c
new file mode 100644
index 00000000000..fa19e768130
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/sra-longjmp-1.c
@@ -0,0 +1,87 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fdump-tree-sra" } */
+/* { dg-require-effective-target indirect_jumps } */
+
+#include <setjmp.h>
+
+struct S {
+  long *b;
+  int c;
+  int s;
+};
+
+static jmp_buf the_jmpbuf;
+volatile short vs = 0;
+long buf[16];
+long * volatile pbuf = (long *) &buf;
+
+static void __attribute__((noinline))
+crazy_alloc_s (struct S *p)
+{
+  int s = p->c ? p->c * 2 : 16;
+
+  long *b = pbuf;
+  if (!b || s > 16)
+    {
+      p->s = -p->s;
+      vs = 127;
+      longjmp (the_jmpbuf, 1);
+    }
+
+  __builtin_memcpy (b, p->b, p->c);
+  p->b = b;
+  p->s = s;
+  pbuf = 0;
+  return;
+}
+
+long __attribute__((noipa))
+process (long v)
+{
+  return v + 1;
+}
+
+void
+foo (void)
+{
+  struct S stack;
+
+  if (setjmp (the_jmpbuf))
+    return;
+
+  stack.c = 0;
+  crazy_alloc_s (&stack);
+  stack.b[0] = 1;
+  stack.c = 1;
+
+  while (stack.c)
+    {
+      long l = stack.b[--stack.c];
+
+      if (l > 0)
+	{
+	  for (int i = 0; i < 4; i++)
+	    {
+	      if (stack.s <= stack.c + 1)
+		crazy_alloc_s (&stack);
+	      l = process (l);
+	      stack.b[stack.c++] = l;
+	    }
+	}
+    }
+
+  return;
+}
+
+int main (int argc, char **argv)
+{
+  vs = 0;
+  pbuf = (long *) &buf;
+  foo ();
+  if (vs != 127)
+    __builtin_abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-not "Created a replacement for stack offset" "sra"} } */
diff --git a/gcc/testsuite/gfortran.dg/pr43984.f90 b/gcc/testsuite/gfortran.dg/pr43984.f90
index 130d114462c..dce26b0ef3b 100644
--- a/gcc/testsuite/gfortran.dg/pr43984.f90
+++ b/gcc/testsuite/gfortran.dg/pr43984.f90
@@ -1,5 +1,5 @@
 ! { dg-do compile }
-! { dg-options "-O2 -fno-tree-dominator-opts -fdump-tree-pre" }
+! { dg-options "-O2 -fno-tree-dominator-opts -fdump-tree-pre -fno-tree-sra" }
 module test
 
    type shell1quartet_type
diff --git a/gcc/tree-sra.cc b/gcc/tree-sra.cc
index b985dee6964..3a0d52675fe 100644
--- a/gcc/tree-sra.cc
+++ b/gcc/tree-sra.cc
@@ -337,6 +337,9 @@ static bitmap should_scalarize_away_bitmap, cannot_scalarize_away_bitmap;
    because this would produce non-constant expressions (e.g. Ada).  */
 static bitmap disqualified_constants;
 
+/* Bitmap of candidates which are passed by reference in call arguments.  */
+static bitmap passed_by_ref_in_call;
+
 /* Obstack for creation of fancy names.  */
 static struct obstack name_obstack;
 
@@ -717,6 +720,7 @@ sra_initialize (void)
   should_scalarize_away_bitmap = BITMAP_ALLOC (NULL);
   cannot_scalarize_away_bitmap = BITMAP_ALLOC (NULL);
   disqualified_constants = BITMAP_ALLOC (NULL);
+  passed_by_ref_in_call = BITMAP_ALLOC (NULL);
   gcc_obstack_init (&name_obstack);
   base_access_vec = new hash_map<tree, auto_vec<access_p> >;
   memset (&sra_stats, 0, sizeof (sra_stats));
@@ -733,6 +737,7 @@ sra_deinitialize (void)
   BITMAP_FREE (should_scalarize_away_bitmap);
   BITMAP_FREE (cannot_scalarize_away_bitmap);
   BITMAP_FREE (disqualified_constants);
+  BITMAP_FREE (passed_by_ref_in_call);
   access_pool.release ();
   assign_link_pool.release ();
   obstack_free (&name_obstack, NULL);
@@ -905,9 +910,9 @@ create_access (tree expr, gimple *stmt, bool write)
 				  &reverse);
 
   /* For constant-pool entries, check we can substitute the constant value.  */
-  if (constant_decl_p (base))
+  if (constant_decl_p (base)
+      && !bitmap_bit_p (disqualified_constants, DECL_UID (base)))
     {
-      gcc_assert (!bitmap_bit_p (disqualified_constants, DECL_UID (base)));
       if (expr != base
 	  && !is_gimple_reg_type (TREE_TYPE (expr))
 	  && dump_file && (dump_flags & TDF_DETAILS))
@@ -1135,6 +1140,17 @@ sra_handled_bf_read_p (tree expr)
 static struct access *
 build_access_from_expr_1 (tree expr, gimple *stmt, bool write)
 {
+  /* We only allow ADDR_EXPRs in arguments of function calls and those must
+     have been dealt with in build_access_from_call_arg.  Any other address
+     taking should have been caught by scan_visit_addr.   */
+  if (TREE_CODE (expr) == ADDR_EXPR)
+    {
+      tree base = get_base_address (TREE_OPERAND (expr, 0));
+      gcc_assert (!DECL_P (base)
+		  || !bitmap_bit_p (candidate_bitmap, DECL_UID (base)));
+      return NULL;
+    }
+
   struct access *ret = NULL;
   bool partial_ref;
 
@@ -1223,6 +1239,77 @@ build_access_from_expr (tree expr, gimple *stmt, bool write)
   return false;
 }
 
+enum out_edge_check { SRA_OUTGOING_EDGES_UNCHECKED, SRA_OUTGOING_EDGES_OK,
+		      SRA_OUTGOING_EDGES_FAIL };
+
+/* Return true if STMT terminates BB and there is an abnormal edge going out of
+   the BB and remember the decision in OE_CHECK.  */
+
+static bool
+abnormal_edge_after_stmt_p (gimple *stmt, enum out_edge_check *oe_check)
+{
+  if (*oe_check == SRA_OUTGOING_EDGES_OK)
+    return false;
+  if (*oe_check == SRA_OUTGOING_EDGES_FAIL)
+    return true;
+  if (stmt_ends_bb_p (stmt))
+    {
+      edge e;
+      edge_iterator ei;
+      FOR_EACH_EDGE (e, ei, gimple_bb (stmt)->succs)
+	if (e->flags & EDGE_ABNORMAL)
+	  {
+	    *oe_check = SRA_OUTGOING_EDGES_FAIL;
+	    return true;
+	  }
+    }
+  *oe_check = SRA_OUTGOING_EDGES_OK;
+  return false;
+}
+
+/* Scan expression EXPR which is an argument of a call and create access
+   structures for all accesses to candidates for scalarization.  Return true if
+   any access has been inserted.  STMT must be the statement from which the
+   expression is taken.  */
+
+static bool
+build_access_from_call_arg (tree expr, gimple *stmt,
+			    enum out_edge_check *oe_check)
+{
+  if (TREE_CODE (expr) == ADDR_EXPR)
+    {
+      tree base = get_base_address (TREE_OPERAND (expr, 0));
+
+      if (abnormal_edge_after_stmt_p (stmt, oe_check))
+	{
+	  disqualify_base_of_expr (base, "May lead to need to add statements "
+				   "to abnormal edge.");
+	  return false;
+	}
+
+      bool read =  build_access_from_expr (base, stmt, false);
+      bool write =  build_access_from_expr (base, stmt, true);
+      if (read || write)
+	{
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    {
+	      fprintf (dump_file, "Allowed ADDR_EXPR of ");
+	      print_generic_expr (dump_file, base);
+	      fprintf (dump_file, " because of ");
+	      print_gimple_stmt (dump_file, stmt, 0);
+	      fprintf (dump_file, "\n");
+	    }
+	  bitmap_set_bit (passed_by_ref_in_call, DECL_UID (base));
+	  return true;
+	}
+      else
+	return false;
+    }
+
+  return build_access_from_expr (expr, stmt, false);
+}
+
+
 /* Return the single non-EH successor edge of BB or NULL if there is none or
    more than one.  */
 
@@ -1364,16 +1451,18 @@ build_accesses_from_assign (gimple *stmt)
   return lacc || racc;
 }
 
-/* Callback of walk_stmt_load_store_addr_ops visit_addr used to determine
-   GIMPLE_ASM operands with memory constrains which cannot be scalarized.  */
+/* Callback of walk_stmt_load_store_addr_ops visit_addr used to detect taking
+   addresses of candidates a places which are not call arguments.  Such
+   candidates are disqalified from SRA.  This also applies to GIMPLE_ASM
+   operands with memory constrains which cannot be scalarized.  */
 
 static bool
-asm_visit_addr (gimple *, tree op, tree, void *)
+scan_visit_addr (gimple *, tree op, tree, void *)
 {
   op = get_base_address (op);
   if (op
       && DECL_P (op))
-    disqualify_candidate (op, "Non-scalarizable GIMPLE_ASM operand.");
+    disqualify_candidate (op, "Address taken in a non-call-argument context.");
 
   return false;
 }
@@ -1390,12 +1479,20 @@ scan_function (void)
   FOR_EACH_BB_FN (bb, cfun)
     {
       gimple_stmt_iterator gsi;
+      for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	walk_stmt_load_store_addr_ops (gsi_stmt (gsi), NULL, NULL, NULL,
+				       scan_visit_addr);
+
       for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
 	{
 	  gimple *stmt = gsi_stmt (gsi);
 	  tree t;
 	  unsigned i;
 
+	  if (gimple_code (stmt) != GIMPLE_CALL)
+	    walk_stmt_load_store_addr_ops (stmt, NULL, NULL, NULL,
+					   scan_visit_addr);
+
 	  switch (gimple_code (stmt))
 	    {
 	    case GIMPLE_RETURN:
@@ -1409,9 +1506,15 @@ scan_function (void)
 	      break;
 
 	    case GIMPLE_CALL:
-	      for (i = 0; i < gimple_call_num_args (stmt); i++)
-		ret |= build_access_from_expr (gimple_call_arg (stmt, i),
-					       stmt, false);
+	      {
+		enum out_edge_check oe_check = SRA_OUTGOING_EDGES_UNCHECKED;
+		for (i = 0; i < gimple_call_num_args (stmt); i++)
+		  ret |= build_access_from_call_arg (gimple_call_arg (stmt, i),
+						     stmt, &oe_check);
+		if (gimple_call_chain(stmt))
+		  ret |= build_access_from_call_arg (gimple_call_chain(stmt),
+						     stmt, &oe_check);
+	      }
 
 	      t = gimple_call_lhs (stmt);
 	      if (t && !disqualify_if_bad_bb_terminating_stmt (stmt, t, NULL))
@@ -1428,8 +1531,6 @@ scan_function (void)
 	    case GIMPLE_ASM:
 	      {
 		gasm *asm_stmt = as_a <gasm *> (stmt);
-		walk_stmt_load_store_addr_ops (asm_stmt, NULL, NULL, NULL,
-					       asm_visit_addr);
 		for (i = 0; i < gimple_asm_ninputs (asm_stmt); i++)
 		  {
 		    t = TREE_VALUE (gimple_asm_input_op (asm_stmt, i));
@@ -1920,10 +2021,19 @@ maybe_add_sra_candidate (tree var)
       reject (var, "not aggregate");
       return false;
     }
-  /* Allow constant-pool entries that "need to live in memory".  */
-  if (needs_to_live_in_memory (var) && !constant_decl_p (var))
+
+  if ((is_global_var (var)
+       /* There are cases where non-addressable variables fail the
+	  pt_solutions_check test, e.g in gcc.dg/uninit-40.c. */
+       || (TREE_ADDRESSABLE (var)
+	   && pt_solution_includes (&cfun->gimple_df->escaped, var))
+       || (TREE_CODE (var) == RESULT_DECL
+	   && !DECL_BY_REFERENCE (var)
+	   && aggregate_value_p (var, current_function_decl)))
+      /* Allow constant-pool entries that "need to live in memory".  */
+      && !constant_decl_p (var))
     {
-      reject (var, "needs to live in memory");
+      reject (var, "needs to live in memory and escapes or global");
       return false;
     }
   if (TREE_THIS_VOLATILE (var))
@@ -2122,6 +2232,23 @@ sort_and_splice_var_accesses (tree var)
 	gcc_assert (access->offset >= low
 		    && access->offset + access->size <= high);
 
+      if (INTEGRAL_TYPE_P (access->type)
+	  && TYPE_PRECISION (access->type) != access->size
+	  && bitmap_bit_p (passed_by_ref_in_call, DECL_UID (access->base)))
+	{
+	  /* This can lead to performance regressions because we can generate
+	     excessive zero extensions.  */
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    {
+	      fprintf (dump_file, "Won't scalarize ");
+	      print_generic_expr (dump_file, access->base);
+	      fprintf (dump_file, "(%d), it is passed by reference to a call "
+		       "and there are accesses with precision not covering "
+		       "their type size.", DECL_UID (access->base));
+	    }
+	  return NULL;
+	}
+
       grp_same_access_path = path_comparable_for_same_access (access->expr);
 
       j = i + 1;
@@ -3774,12 +3901,18 @@ get_access_for_expr (tree expr)
 
 /* Replace the expression EXPR with a scalar replacement if there is one and
    generate other statements to do type conversion or subtree copying if
-   necessary.  GSI is used to place newly created statements, WRITE is true if
-   the expression is being written to (it is on a LHS of a statement or output
-   in an assembly statement).  */
+   necessary.  WRITE is true if the expression is being written to (it is on a
+   LHS of a statement or output in an assembly statement).  STMT_GSI is used to
+   place newly created statements before the processed statement, REFRESH_GSI
+   is used to place them afterwards - unless the processed statement must end a
+   BB in which case it is placed on the outgoing non-EH edge.  REFRESH_GSI and
+   is then used to continue iteration over the BB.  If sra_modify_expr is
+   called only once with WRITE equal to true on a given statement, both
+   iterator parameters can point to the same one.  */
 
 static bool
-sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write)
+sra_modify_expr (tree *expr, bool write, gimple_stmt_iterator *stmt_gsi,
+		 gimple_stmt_iterator *refresh_gsi)
 {
   location_t loc;
   struct access *access;
@@ -3806,12 +3939,12 @@ sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write)
   type = TREE_TYPE (*expr);
   orig_expr = *expr;
 
-  loc = gimple_location (gsi_stmt (*gsi));
+  loc = gimple_location (gsi_stmt (*stmt_gsi));
   gimple_stmt_iterator alt_gsi = gsi_none ();
-  if (write && stmt_ends_bb_p (gsi_stmt (*gsi)))
+  if (write && stmt_ends_bb_p (gsi_stmt (*stmt_gsi)))
     {
-      alt_gsi = gsi_start_edge (single_non_eh_succ (gsi_bb (*gsi)));
-      gsi = &alt_gsi;
+      alt_gsi = gsi_start_edge (single_non_eh_succ (gsi_bb (*stmt_gsi)));
+      refresh_gsi = &alt_gsi;
     }
 
   if (access->grp_to_be_replaced)
@@ -3831,7 +3964,8 @@ sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write)
 	{
 	  tree ref;
 
-	  ref = build_ref_for_model (loc, orig_expr, 0, access, gsi, false);
+	  ref = build_ref_for_model (loc, orig_expr, 0, access, stmt_gsi,
+				     false);
 
 	  if (partial_cplx_access)
 	    {
@@ -3847,7 +3981,7 @@ sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write)
 		  tree tmp = make_ssa_name (type);
 		  gassign *stmt = gimple_build_assign (tmp, t);
 		  /* This is always a read. */
-		  gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
+		  gsi_insert_before (stmt_gsi, stmt, GSI_SAME_STMT);
 		  t = tmp;
 		}
 	      *expr = t;
@@ -3857,22 +3991,23 @@ sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write)
 	      gassign *stmt;
 
 	      if (access->grp_partial_lhs)
-		ref = force_gimple_operand_gsi (gsi, ref, true, NULL_TREE,
-						 false, GSI_NEW_STMT);
+		ref = force_gimple_operand_gsi (refresh_gsi, ref, true,
+						NULL_TREE, false, GSI_NEW_STMT);
 	      stmt = gimple_build_assign (repl, ref);
 	      gimple_set_location (stmt, loc);
-	      gsi_insert_after (gsi, stmt, GSI_NEW_STMT);
+	      gsi_insert_after (refresh_gsi, stmt, GSI_NEW_STMT);
 	    }
 	  else
 	    {
 	      gassign *stmt;
 
 	      if (access->grp_partial_lhs)
-		repl = force_gimple_operand_gsi (gsi, repl, true, NULL_TREE,
-						 true, GSI_SAME_STMT);
+		repl = force_gimple_operand_gsi (stmt_gsi, repl, true,
+						 NULL_TREE, true,
+						 GSI_SAME_STMT);
 	      stmt = gimple_build_assign (ref, repl);
 	      gimple_set_location (stmt, loc);
-	      gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
+	      gsi_insert_before (stmt_gsi, stmt, GSI_SAME_STMT);
 	    }
 	}
       else
@@ -3899,8 +4034,8 @@ sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write)
     {
       gdebug *ds = gimple_build_debug_bind (get_access_replacement (access),
 					    NULL_TREE,
-					    gsi_stmt (*gsi));
-      gsi_insert_after (gsi, ds, GSI_NEW_STMT);
+					    gsi_stmt (*stmt_gsi));
+      gsi_insert_after (stmt_gsi, ds, GSI_NEW_STMT);
     }
 
   if (access->first_child && !TREE_READONLY (access->base))
@@ -3918,8 +4053,57 @@ sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write)
 	start_offset = chunk_size = 0;
 
       generate_subtree_copies (access->first_child, orig_expr, access->offset,
-			       start_offset, chunk_size, gsi, write, write,
-			       loc);
+			       start_offset, chunk_size,
+			       write ? refresh_gsi : stmt_gsi,
+			       write, write, loc);
+    }
+  return true;
+}
+
+/* If EXPR, which must be a call argument, is an ADDR_EXPR, generate writes and
+   reads from its base before and after the call statement given in CALL_GSI
+   and return true if any copying took place.  Otherwise call sra_modify_expr
+   on EXPR and return its value.  FLAGS is what the gimple_call_arg_flags
+   return for the given parameter.  */
+
+static bool
+sra_modify_call_arg (tree *expr, gimple_stmt_iterator *call_gsi,
+		     gimple_stmt_iterator *refresh_gsi, int flags)
+{
+  if (TREE_CODE (*expr) != ADDR_EXPR)
+    return sra_modify_expr (expr, false, call_gsi, refresh_gsi);
+
+  if (flags & EAF_UNUSED)
+    return false;
+
+  tree base = get_base_address (TREE_OPERAND (*expr, 0));
+  if (!DECL_P (base))
+    return false;
+  struct access *access = get_access_for_expr (base);
+  if (!access)
+    return false;
+
+  gimple *stmt = gsi_stmt (*call_gsi);
+  location_t loc = gimple_location (stmt);
+  generate_subtree_copies (access, base, 0, 0, 0, call_gsi, false, false,
+			   loc);
+
+  if (flags & EAF_NO_DIRECT_CLOBBER)
+    return true;
+
+  if (!stmt_ends_bb_p (stmt))
+    generate_subtree_copies (access, base, 0, 0, 0, refresh_gsi, true,
+			     true, loc);
+  else
+    {
+      edge e;
+      edge_iterator ei;
+      FOR_EACH_EDGE (e, ei, gsi_bb (*call_gsi)->succs)
+	{
+	  gimple_stmt_iterator alt_gsi = gsi_start_edge (e);
+	  generate_subtree_copies (access, base, 0, 0, 0, &alt_gsi, true,
+				   true, loc);
+	}
     }
   return true;
 }
@@ -4279,9 +4463,9 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi)
       || TREE_CODE (lhs) == BIT_FIELD_REF)
     {
       modify_this_stmt = sra_modify_expr (gimple_assign_rhs1_ptr (stmt),
-					  gsi, false);
+					  false, gsi, gsi);
       modify_this_stmt |= sra_modify_expr (gimple_assign_lhs_ptr (stmt),
-					   gsi, true);
+					   true, gsi, gsi);
       return modify_this_stmt ? SRA_AM_MODIFIED : SRA_AM_NONE;
     }
 
@@ -4603,7 +4787,7 @@ sra_modify_function_body (void)
 	    case GIMPLE_RETURN:
 	      t = gimple_return_retval_ptr (as_a <greturn *> (stmt));
 	      if (*t != NULL_TREE)
-		modified |= sra_modify_expr (t, &gsi, false);
+		modified |= sra_modify_expr (t, false, &gsi, &gsi);
 	      break;
 
 	    case GIMPLE_ASSIGN:
@@ -4622,33 +4806,44 @@ sra_modify_function_body (void)
 		}
 	      else
 		{
+		  gcall *call = as_a <gcall *> (stmt);
+		  gimple_stmt_iterator call_gsi = gsi;
+
 		  /* Operands must be processed before the lhs.  */
-		  for (i = 0; i < gimple_call_num_args (stmt); i++)
+		  for (i = 0; i < gimple_call_num_args (call); i++)
 		    {
-		      t = gimple_call_arg_ptr (stmt, i);
-		      modified |= sra_modify_expr (t, &gsi, false);
+		      int flags = gimple_call_arg_flags (call, i);
+		      t = gimple_call_arg_ptr (call, i);
+		      modified |= sra_modify_call_arg (t, &call_gsi, &gsi, flags);
 		    }
-
-	      	  if (gimple_call_lhs (stmt))
+		  if (gimple_call_chain (call))
+		    {
+		      t = gimple_call_chain_ptr (call);
+		      int flags = gimple_call_static_chain_flags (call);
+		      modified |= sra_modify_call_arg (t, &call_gsi, &gsi,
+						       flags);
+		    }
+		  if (gimple_call_lhs (call))
 		    {
-		      t = gimple_call_lhs_ptr (stmt);
-		      modified |= sra_modify_expr (t, &gsi, true);
+		      t = gimple_call_lhs_ptr (call);
+		      modified |= sra_modify_expr (t, true, &call_gsi, &gsi);
 		    }
 		}
 	      break;
 
 	    case GIMPLE_ASM:
 	      {
+		gimple_stmt_iterator stmt_gsi = gsi;
 		gasm *asm_stmt = as_a <gasm *> (stmt);
 		for (i = 0; i < gimple_asm_ninputs (asm_stmt); i++)
 		  {
 		    t = &TREE_VALUE (gimple_asm_input_op (asm_stmt, i));
-		    modified |= sra_modify_expr (t, &gsi, false);
+		    modified |= sra_modify_expr (t, false, &stmt_gsi, &gsi);
 		  }
 		for (i = 0; i < gimple_asm_noutputs (asm_stmt); i++)
 		  {
 		    t = &TREE_VALUE (gimple_asm_output_op (asm_stmt, i));
-		    modified |= sra_modify_expr (t, &gsi, true);
+		    modified |= sra_modify_expr (t, true, &stmt_gsi, &gsi);
 		  }
 	      }
 	      break;
diff mbox series

Patch

diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr109849.C b/gcc/testsuite/g++.dg/tree-ssa/pr109849.C
new file mode 100644
index 00000000000..cd348c0f590
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr109849.C
@@ -0,0 +1,31 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sra" } */
+
+#include <vector>
+typedef unsigned int uint32_t;
+std::pair<uint32_t, uint32_t> pair;
+void
+test()
+{
+        std::vector<std::pair<uint32_t, uint32_t> > stack;
+        stack.push_back (pair);
+        while (!stack.empty()) {
+                std::pair<uint32_t, uint32_t> cur = stack.back();
+                stack.pop_back();
+                if (!cur.first)
+                {
+                        cur.second++;
+                        stack.push_back (cur);
+                }
+                if (cur.second > 10000)
+                        break;
+        }
+}
+int
+main()
+{
+        for (int i = 0; i < 10000; i++)
+          test();
+}
+
+/* { dg-final { scan-tree-dump "Created a replacement for stack offset" "sra"} } */
diff --git a/gcc/testsuite/gfortran.dg/pr43984.f90 b/gcc/testsuite/gfortran.dg/pr43984.f90
index 130d114462c..dce26b0ef3b 100644
--- a/gcc/testsuite/gfortran.dg/pr43984.f90
+++ b/gcc/testsuite/gfortran.dg/pr43984.f90
@@ -1,5 +1,5 @@ 
 ! { dg-do compile }
-! { dg-options "-O2 -fno-tree-dominator-opts -fdump-tree-pre" }
+! { dg-options "-O2 -fno-tree-dominator-opts -fdump-tree-pre -fno-tree-sra" }
 module test
 
    type shell1quartet_type
diff --git a/gcc/tree-sra.cc b/gcc/tree-sra.cc
index b985dee6964..676931d5383 100644
--- a/gcc/tree-sra.cc
+++ b/gcc/tree-sra.cc
@@ -337,6 +337,9 @@  static bitmap should_scalarize_away_bitmap, cannot_scalarize_away_bitmap;
    because this would produce non-constant expressions (e.g. Ada).  */
 static bitmap disqualified_constants;
 
+/* Bitmap of candidates which are passed by reference in call arguments.  */
+static bitmap passed_by_ref_in_call;
+
 /* Obstack for creation of fancy names.  */
 static struct obstack name_obstack;
 
@@ -717,6 +720,7 @@  sra_initialize (void)
   should_scalarize_away_bitmap = BITMAP_ALLOC (NULL);
   cannot_scalarize_away_bitmap = BITMAP_ALLOC (NULL);
   disqualified_constants = BITMAP_ALLOC (NULL);
+  passed_by_ref_in_call = BITMAP_ALLOC (NULL);
   gcc_obstack_init (&name_obstack);
   base_access_vec = new hash_map<tree, auto_vec<access_p> >;
   memset (&sra_stats, 0, sizeof (sra_stats));
@@ -733,6 +737,7 @@  sra_deinitialize (void)
   BITMAP_FREE (should_scalarize_away_bitmap);
   BITMAP_FREE (cannot_scalarize_away_bitmap);
   BITMAP_FREE (disqualified_constants);
+  BITMAP_FREE (passed_by_ref_in_call);
   access_pool.release ();
   assign_link_pool.release ();
   obstack_free (&name_obstack, NULL);
@@ -905,9 +910,9 @@  create_access (tree expr, gimple *stmt, bool write)
 				  &reverse);
 
   /* For constant-pool entries, check we can substitute the constant value.  */
-  if (constant_decl_p (base))
+  if (constant_decl_p (base)
+      && !bitmap_bit_p (disqualified_constants, DECL_UID (base)))
     {
-      gcc_assert (!bitmap_bit_p (disqualified_constants, DECL_UID (base)));
       if (expr != base
 	  && !is_gimple_reg_type (TREE_TYPE (expr))
 	  && dump_file && (dump_flags & TDF_DETAILS))
@@ -1135,6 +1140,17 @@  sra_handled_bf_read_p (tree expr)
 static struct access *
 build_access_from_expr_1 (tree expr, gimple *stmt, bool write)
 {
+  /* We only allow ADDR_EXPRs in arguments of function calls and those must
+     have been dealt with in build_access_from_call_arg.  Any other address
+     taking should have been caught by scan_visit_addr.   */
+  if (TREE_CODE (expr) == ADDR_EXPR)
+    {
+      tree base = get_base_address (TREE_OPERAND (expr, 0));
+      gcc_assert (!DECL_P (base)
+		  || !bitmap_bit_p (candidate_bitmap, DECL_UID (base)));
+      return NULL;
+    }
+
   struct access *ret = NULL;
   bool partial_ref;
 
@@ -1223,6 +1239,40 @@  build_access_from_expr (tree expr, gimple *stmt, bool write)
   return false;
 }
 
+/* Scan expression EXPR which is an argument of a call and create access
+   structures for all accesses to candidates for scalarization.  Return true if
+   any access has been inserted.  STMT must be the statement from which the
+   expression is taken.  */
+
+static bool
+build_access_from_call_arg (tree expr, gimple *stmt)
+{
+  if (TREE_CODE (expr) == ADDR_EXPR)
+    {
+      tree base = get_base_address (TREE_OPERAND (expr, 0));
+      bool read =  build_access_from_expr (base, stmt, false);
+      bool write =  build_access_from_expr (base, stmt, true);
+      if (read || write)
+	{
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    {
+	      fprintf (dump_file, "Allowed ADDR_EXPR of ");
+	      print_generic_expr (dump_file, base);
+	      fprintf (dump_file, " because of ");
+	      print_gimple_stmt (dump_file, stmt, 0);
+	      fprintf (dump_file, "\n");
+	    }
+	  bitmap_set_bit (passed_by_ref_in_call, DECL_UID (base));
+	  return true;
+	}
+      else
+	return false;
+    }
+
+  return build_access_from_expr (expr, stmt, false);
+}
+
+
 /* Return the single non-EH successor edge of BB or NULL if there is none or
    more than one.  */
 
@@ -1364,16 +1414,18 @@  build_accesses_from_assign (gimple *stmt)
   return lacc || racc;
 }
 
-/* Callback of walk_stmt_load_store_addr_ops visit_addr used to determine
-   GIMPLE_ASM operands with memory constrains which cannot be scalarized.  */
+/* Callback of walk_stmt_load_store_addr_ops visit_addr used to detect taking
+   addresses of candidates a places which are not call arguments.  Such
+   candidates are disqalified from SRA.  This also applies to GIMPLE_ASM
+   operands with memory constrains which cannot be scalarized.  */
 
 static bool
-asm_visit_addr (gimple *, tree op, tree, void *)
+scan_visit_addr (gimple *, tree op, tree, void *)
 {
   op = get_base_address (op);
   if (op
       && DECL_P (op))
-    disqualify_candidate (op, "Non-scalarizable GIMPLE_ASM operand.");
+    disqualify_candidate (op, "Address taken in a non-call-argument context.");
 
   return false;
 }
@@ -1390,12 +1442,20 @@  scan_function (void)
   FOR_EACH_BB_FN (bb, cfun)
     {
       gimple_stmt_iterator gsi;
+      for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	walk_stmt_load_store_addr_ops (gsi_stmt (gsi), NULL, NULL, NULL,
+				       scan_visit_addr);
+
       for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
 	{
 	  gimple *stmt = gsi_stmt (gsi);
 	  tree t;
 	  unsigned i;
 
+	  if (gimple_code (stmt) != GIMPLE_CALL)
+	    walk_stmt_load_store_addr_ops (stmt, NULL, NULL, NULL,
+					   scan_visit_addr);
+
 	  switch (gimple_code (stmt))
 	    {
 	    case GIMPLE_RETURN:
@@ -1410,8 +1470,11 @@  scan_function (void)
 
 	    case GIMPLE_CALL:
 	      for (i = 0; i < gimple_call_num_args (stmt); i++)
-		ret |= build_access_from_expr (gimple_call_arg (stmt, i),
-					       stmt, false);
+		ret |= build_access_from_call_arg (gimple_call_arg (stmt, i),
+						   stmt);
+	      if (gimple_call_chain(stmt))
+		ret |= build_access_from_call_arg (gimple_call_chain(stmt),
+						   stmt);
 
 	      t = gimple_call_lhs (stmt);
 	      if (t && !disqualify_if_bad_bb_terminating_stmt (stmt, t, NULL))
@@ -1428,8 +1491,6 @@  scan_function (void)
 	    case GIMPLE_ASM:
 	      {
 		gasm *asm_stmt = as_a <gasm *> (stmt);
-		walk_stmt_load_store_addr_ops (asm_stmt, NULL, NULL, NULL,
-					       asm_visit_addr);
 		for (i = 0; i < gimple_asm_ninputs (asm_stmt); i++)
 		  {
 		    t = TREE_VALUE (gimple_asm_input_op (asm_stmt, i));
@@ -1920,10 +1981,19 @@  maybe_add_sra_candidate (tree var)
       reject (var, "not aggregate");
       return false;
     }
-  /* Allow constant-pool entries that "need to live in memory".  */
-  if (needs_to_live_in_memory (var) && !constant_decl_p (var))
+
+  if ((is_global_var (var)
+       /* There are cases where non-addressable variables fail the
+	  pt_solutions_check test, e.g in gcc.dg/uninit-40.c. */
+       || (TREE_ADDRESSABLE (var)
+	   && pt_solution_includes (&cfun->gimple_df->escaped, var))
+       || (TREE_CODE (var) == RESULT_DECL
+	   && !DECL_BY_REFERENCE (var)
+	   && aggregate_value_p (var, current_function_decl)))
+      /* Allow constant-pool entries that "need to live in memory".  */
+      && !constant_decl_p (var))
     {
-      reject (var, "needs to live in memory");
+      reject (var, "needs to live in memory and escapes or global");
       return false;
     }
   if (TREE_THIS_VOLATILE (var))
@@ -2122,6 +2192,21 @@  sort_and_splice_var_accesses (tree var)
 	gcc_assert (access->offset >= low
 		    && access->offset + access->size <= high);
 
+      if (INTEGRAL_TYPE_P (access->type)
+	  && TYPE_PRECISION (access->type) != access->size
+	  && bitmap_bit_p (passed_by_ref_in_call, DECL_UID (access->base)))
+	{
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    {
+	      fprintf (dump_file, "Won't scalarize ");
+	      print_generic_expr (dump_file, access->base);
+	      fprintf (dump_file, "(%d), it is passed by reference to a call "
+		       "and there are accesses with presicion not covering "
+		       "their type size.", DECL_UID (access->base));
+	    }
+	  return NULL;
+	}
+
       grp_same_access_path = path_comparable_for_same_access (access->expr);
 
       j = i + 1;
@@ -3774,12 +3859,18 @@  get_access_for_expr (tree expr)
 
 /* Replace the expression EXPR with a scalar replacement if there is one and
    generate other statements to do type conversion or subtree copying if
-   necessary.  GSI is used to place newly created statements, WRITE is true if
-   the expression is being written to (it is on a LHS of a statement or output
-   in an assembly statement).  */
+   necessary.  WRITE is true if the expression is being written to (it is on a
+   LHS of a statement or output in an assembly statement).  STMT_GSI is used to
+   place newly created statements before the processed statement, REFRESH_GSI
+   is used to place them afterwards - unless the processed statement must end a
+   BB in which case it is placed on the outgoing non-EH edge.  REFRESH_GSI and
+   is then used to continue iteration over the BB.  If sra_modify_expr is
+   called only once with WRITE equal to true on a given statement, both
+   iterator parameters can point to the same one.  */
 
 static bool
-sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write)
+sra_modify_expr (tree *expr, bool write, gimple_stmt_iterator *stmt_gsi,
+		 gimple_stmt_iterator *refresh_gsi)
 {
   location_t loc;
   struct access *access;
@@ -3806,12 +3897,12 @@  sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write)
   type = TREE_TYPE (*expr);
   orig_expr = *expr;
 
-  loc = gimple_location (gsi_stmt (*gsi));
+  loc = gimple_location (gsi_stmt (*stmt_gsi));
   gimple_stmt_iterator alt_gsi = gsi_none ();
-  if (write && stmt_ends_bb_p (gsi_stmt (*gsi)))
+  if (write && stmt_ends_bb_p (gsi_stmt (*stmt_gsi)))
     {
-      alt_gsi = gsi_start_edge (single_non_eh_succ (gsi_bb (*gsi)));
-      gsi = &alt_gsi;
+      alt_gsi = gsi_start_edge (single_non_eh_succ (gsi_bb (*stmt_gsi)));
+      refresh_gsi = &alt_gsi;
     }
 
   if (access->grp_to_be_replaced)
@@ -3831,7 +3922,8 @@  sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write)
 	{
 	  tree ref;
 
-	  ref = build_ref_for_model (loc, orig_expr, 0, access, gsi, false);
+	  ref = build_ref_for_model (loc, orig_expr, 0, access, stmt_gsi,
+				     false);
 
 	  if (partial_cplx_access)
 	    {
@@ -3847,7 +3939,7 @@  sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write)
 		  tree tmp = make_ssa_name (type);
 		  gassign *stmt = gimple_build_assign (tmp, t);
 		  /* This is always a read. */
-		  gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
+		  gsi_insert_before (stmt_gsi, stmt, GSI_SAME_STMT);
 		  t = tmp;
 		}
 	      *expr = t;
@@ -3857,22 +3949,23 @@  sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write)
 	      gassign *stmt;
 
 	      if (access->grp_partial_lhs)
-		ref = force_gimple_operand_gsi (gsi, ref, true, NULL_TREE,
-						 false, GSI_NEW_STMT);
+		ref = force_gimple_operand_gsi (refresh_gsi, ref, true,
+						NULL_TREE, false, GSI_NEW_STMT);
 	      stmt = gimple_build_assign (repl, ref);
 	      gimple_set_location (stmt, loc);
-	      gsi_insert_after (gsi, stmt, GSI_NEW_STMT);
+	      gsi_insert_after (refresh_gsi, stmt, GSI_NEW_STMT);
 	    }
 	  else
 	    {
 	      gassign *stmt;
 
 	      if (access->grp_partial_lhs)
-		repl = force_gimple_operand_gsi (gsi, repl, true, NULL_TREE,
-						 true, GSI_SAME_STMT);
+		repl = force_gimple_operand_gsi (stmt_gsi, repl, true,
+						 NULL_TREE, true,
+						 GSI_SAME_STMT);
 	      stmt = gimple_build_assign (ref, repl);
 	      gimple_set_location (stmt, loc);
-	      gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
+	      gsi_insert_before (stmt_gsi, stmt, GSI_SAME_STMT);
 	    }
 	}
       else
@@ -3899,8 +3992,8 @@  sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write)
     {
       gdebug *ds = gimple_build_debug_bind (get_access_replacement (access),
 					    NULL_TREE,
-					    gsi_stmt (*gsi));
-      gsi_insert_after (gsi, ds, GSI_NEW_STMT);
+					    gsi_stmt (*stmt_gsi));
+      gsi_insert_after (stmt_gsi, ds, GSI_NEW_STMT);
     }
 
   if (access->first_child && !TREE_READONLY (access->base))
@@ -3918,8 +4011,58 @@  sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write)
 	start_offset = chunk_size = 0;
 
       generate_subtree_copies (access->first_child, orig_expr, access->offset,
-			       start_offset, chunk_size, gsi, write, write,
-			       loc);
+			       start_offset, chunk_size,
+			       write ? refresh_gsi : stmt_gsi,
+			       write, write, loc);
+    }
+  return true;
+}
+
+/* If EXPR, which must be a call argument, is an ADDR_EXPR, generate writes and
+   reads from its base before and after the call statement given in CALL_GSI
+   and return true if any copying took place.  Otherwise call sra_modify_expr
+   on EXPR and return its value.  FLAGS is what the gimple_call_arg_flags
+   return for the given parameter.  */
+
+static bool
+sra_modify_call_arg (tree *expr, gimple_stmt_iterator *call_gsi,
+		     gimple_stmt_iterator *refresh_gsi, int flags)
+{
+  if (TREE_CODE (*expr) != ADDR_EXPR)
+    return sra_modify_expr (expr, false, call_gsi, refresh_gsi);
+
+  if (flags & EAF_UNUSED)
+    return false;
+
+  tree base = get_base_address (TREE_OPERAND (*expr, 0));
+  if (!DECL_P (base))
+    return false;
+  struct access *access = get_access_for_expr (base);
+  if (!access)
+    return false;
+
+  gimple *stmt = gsi_stmt (*call_gsi);
+  location_t loc = gimple_location (stmt);
+  generate_subtree_copies (access, base, 0, 0, 0, call_gsi, false, false,
+			   loc);
+
+  if ((flags & (EAF_NO_DIRECT_CLOBBER | EAF_NO_INDIRECT_CLOBBER))
+      == (EAF_NO_DIRECT_CLOBBER | EAF_NO_INDIRECT_CLOBBER))
+    return true;
+
+  if (!stmt_ends_bb_p (stmt))
+    generate_subtree_copies (access, base, 0, 0, 0, refresh_gsi, true,
+			     true, loc);
+  else
+    {
+      edge e;
+      edge_iterator ei;
+      FOR_EACH_EDGE (e, ei, gsi_bb (*call_gsi)->succs)
+	{
+	  gimple_stmt_iterator alt_gsi = gsi_start_edge (e);
+	  generate_subtree_copies (access, base, 0, 0, 0, &alt_gsi, true,
+				   true, loc);
+	}
     }
   return true;
 }
@@ -4279,9 +4422,9 @@  sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi)
       || TREE_CODE (lhs) == BIT_FIELD_REF)
     {
       modify_this_stmt = sra_modify_expr (gimple_assign_rhs1_ptr (stmt),
-					  gsi, false);
+					  false, gsi, gsi);
       modify_this_stmt |= sra_modify_expr (gimple_assign_lhs_ptr (stmt),
-					   gsi, true);
+					   true, gsi, gsi);
       return modify_this_stmt ? SRA_AM_MODIFIED : SRA_AM_NONE;
     }
 
@@ -4603,7 +4746,7 @@  sra_modify_function_body (void)
 	    case GIMPLE_RETURN:
 	      t = gimple_return_retval_ptr (as_a <greturn *> (stmt));
 	      if (*t != NULL_TREE)
-		modified |= sra_modify_expr (t, &gsi, false);
+		modified |= sra_modify_expr (t, false, &gsi, &gsi);
 	      break;
 
 	    case GIMPLE_ASSIGN:
@@ -4622,33 +4765,44 @@  sra_modify_function_body (void)
 		}
 	      else
 		{
+		  gcall *call = as_a <gcall *> (stmt);
+		  gimple_stmt_iterator call_gsi = gsi;
+
 		  /* Operands must be processed before the lhs.  */
-		  for (i = 0; i < gimple_call_num_args (stmt); i++)
+		  for (i = 0; i < gimple_call_num_args (call); i++)
 		    {
-		      t = gimple_call_arg_ptr (stmt, i);
-		      modified |= sra_modify_expr (t, &gsi, false);
+		      int flags = gimple_call_arg_flags (call, i);
+		      t = gimple_call_arg_ptr (call, i);
+		      modified |= sra_modify_call_arg (t, &call_gsi, &gsi, flags);
 		    }
-
-	      	  if (gimple_call_lhs (stmt))
+		  if (gimple_call_chain (call))
+		    {
+		      t = gimple_call_chain_ptr (call);
+		      int flags = gimple_call_static_chain_flags (call);
+		      modified |= sra_modify_call_arg (t, &call_gsi, &gsi,
+						       flags);
+		    }
+		  if (gimple_call_lhs (call))
 		    {
-		      t = gimple_call_lhs_ptr (stmt);
-		      modified |= sra_modify_expr (t, &gsi, true);
+		      t = gimple_call_lhs_ptr (call);
+		      modified |= sra_modify_expr (t, true, &call_gsi, &gsi);
 		    }
 		}
 	      break;
 
 	    case GIMPLE_ASM:
 	      {
+		gimple_stmt_iterator stmt_gsi = gsi;
 		gasm *asm_stmt = as_a <gasm *> (stmt);
 		for (i = 0; i < gimple_asm_ninputs (asm_stmt); i++)
 		  {
 		    t = &TREE_VALUE (gimple_asm_input_op (asm_stmt, i));
-		    modified |= sra_modify_expr (t, &gsi, false);
+		    modified |= sra_modify_expr (t, false, &stmt_gsi, &gsi);
 		  }
 		for (i = 0; i < gimple_asm_noutputs (asm_stmt); i++)
 		  {
 		    t = &TREE_VALUE (gimple_asm_output_op (asm_stmt, i));
-		    modified |= sra_modify_expr (t, &gsi, true);
+		    modified |= sra_modify_expr (t, true, &stmt_gsi, &gsi);
 		  }
 	      }
 	      break;