diff mbox series

[2/4] SRA: Total scalarization after access propagation [PR92706]

Message ID ri6zhe8a0pu.fsf@suse.cz
State New
Headers show
Series [1/4] SRA: Add verification of accesses | expand

Commit Message

Martin Jambor Jan. 28, 2020, 12:22 a.m. UTC
Hi,

this patch fixes the second testcase in PR 92706 by performing total
scalarization only quite a bit later, when we already have access
trees constructed and even done propagation of accesses from RHSs of
assignment to LHSs.

The new code simultaneously traverses the existing access tree and the
declared variable type and adds artificial accesses whenever they can
fit in between the existing ones.  This prevents us from creating
accesses based on the type which then clash with those which have
propagated here from another access tree describing an aggregate on a
RHS of an assignment, which means that both sides of the assignment
will be scalarized differently, leading to bad code and the aggregate
most likely not going away.

This new version is hopefully slightly easier to read and review and
also fixed one potential bug, but otherwise does pretty much the same
thing as the first one.

Bootstrapped and LTO-bootstrapped and tested on an x86_64-linux, where
it causes two new guality XPASSes.  I expect that review will lead to
requests to change things but provided we want to fix PR 92706 now, I
believe this is the way to go.  The fix for PR 92486 which I am
sending as a follow-up also depends on this patch.

Thanks,

Martin

2019-12-20  Martin Jambor  <mjambor@suse.cz>

	PR tree-optimization/92706
	* tree-sra.c (struct access): Adjust comment of
	grp_total_scalarization.
	(find_access_in_subtree): Look for single children spanning an entire
	access.
	(scalarizable_type_p): Allow register accesses, adjust callers.
	(completely_scalarize): Remove function.
	(scalarize_elem): Likewise.
	(create_total_scalarization_access): Likewise.
	(sort_and_splice_var_accesses): Do not track total scalarization
	flags.
	(analyze_access_subtree): New parameter totally, adjust to new meaning
	of grp_total_scalarization.
	(analyze_access_trees): Pass new parameter to analyze_access_subtree.
	(can_totally_scalarize_forest_p): New function.
	(create_total_scalarization_access): Likewise.
	(create_total_access_and_reshape): Likewise.
	(total_should_skip_creating_access): Likewise.
	(totally_scalarize_subtree): Likewise.
	(analyze_all_variable_accesses): Perform total scalarization after
	subaccess propagation using the new functions above.
	(initialize_constant_pool_replacements): Output initializers by
	traversing the access tree.

	testsuite/
	* gcc.dg/tree-ssa/pr92706-2.c: New test.
	* gcc.dg/guality/pr59776.c: Xfail tests for s2.g.
---
 gcc/testsuite/gcc.dg/guality/pr59776.c    |   4 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr92706-2.c |  19 +
 gcc/tree-sra.c                            | 666 ++++++++++++++++------
 3 files changed, 505 insertions(+), 184 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr92706-2.c

Comments

Richard Biener Jan. 29, 2020, 10:09 a.m. UTC | #1
On Tue, 28 Jan 2020, Martin Jambor wrote:

> Hi,
> 
> this patch fixes the second testcase in PR 92706 by performing total
> scalarization only quite a bit later, when we already have access
> trees constructed and even done propagation of accesses from RHSs of
> assignment to LHSs.
> 
> The new code simultaneously traverses the existing access tree and the
> declared variable type and adds artificial accesses whenever they can
> fit in between the existing ones.  This prevents us from creating
> accesses based on the type which then clash with those which have
> propagated here from another access tree describing an aggregate on a
> RHS of an assignment, which means that both sides of the assignment
> will be scalarized differently, leading to bad code and the aggregate
> most likely not going away.
> 
> This new version is hopefully slightly easier to read and review and
> also fixed one potential bug, but otherwise does pretty much the same
> thing as the first one.
> 
> Bootstrapped and LTO-bootstrapped and tested on an x86_64-linux, where
> it causes two new guality XPASSes.  I expect that review will lead to
> requests to change things but provided we want to fix PR 92706 now, I
> believe this is the way to go.  The fix for PR 92486 which I am
> sending as a follow-up also depends on this patch.

This patch is OK.

Thanks,
Richard.

> Thanks,
> 
> Martin
> 
> 2019-12-20  Martin Jambor  <mjambor@suse.cz>
> 
> 	PR tree-optimization/92706
> 	* tree-sra.c (struct access): Adjust comment of
> 	grp_total_scalarization.
> 	(find_access_in_subtree): Look for single children spanning an entire
> 	access.
> 	(scalarizable_type_p): Allow register accesses, adjust callers.
> 	(completely_scalarize): Remove function.
> 	(scalarize_elem): Likewise.
> 	(create_total_scalarization_access): Likewise.
> 	(sort_and_splice_var_accesses): Do not track total scalarization
> 	flags.
> 	(analyze_access_subtree): New parameter totally, adjust to new meaning
> 	of grp_total_scalarization.
> 	(analyze_access_trees): Pass new parameter to analyze_access_subtree.
> 	(can_totally_scalarize_forest_p): New function.
> 	(create_total_scalarization_access): Likewise.
> 	(create_total_access_and_reshape): Likewise.
> 	(total_should_skip_creating_access): Likewise.
> 	(totally_scalarize_subtree): Likewise.
> 	(analyze_all_variable_accesses): Perform total scalarization after
> 	subaccess propagation using the new functions above.
> 	(initialize_constant_pool_replacements): Output initializers by
> 	traversing the access tree.
> 
> 	testsuite/
> 	* gcc.dg/tree-ssa/pr92706-2.c: New test.
> 	* gcc.dg/guality/pr59776.c: Xfail tests for s2.g.
> ---
>  gcc/testsuite/gcc.dg/guality/pr59776.c    |   4 +-
>  gcc/testsuite/gcc.dg/tree-ssa/pr92706-2.c |  19 +
>  gcc/tree-sra.c                            | 666 ++++++++++++++++------
>  3 files changed, 505 insertions(+), 184 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr92706-2.c
> 
> diff --git a/gcc/testsuite/gcc.dg/guality/pr59776.c b/gcc/testsuite/gcc.dg/guality/pr59776.c
> index 382abb622bb..6c1c8165b70 100644
> --- a/gcc/testsuite/gcc.dg/guality/pr59776.c
> +++ b/gcc/testsuite/gcc.dg/guality/pr59776.c
> @@ -12,11 +12,11 @@ foo (struct S *p)
>    struct S s1, s2;			/* { dg-final { gdb-test pr59776.c:17 "s1.f" "5.0" } } */
>    s1 = *p;				/* { dg-final { gdb-test pr59776.c:17 "s1.g" "6.0" } } */
>    s2 = s1;				/* { dg-final { gdb-test pr59776.c:17 "s2.f" "0.0" } } */
> -  *(int *) &s2.f = 0;			/* { dg-final { gdb-test pr59776.c:17 "s2.g" "6.0" } } */
> +  *(int *) &s2.f = 0;			/* { dg-final { gdb-test pr59776.c:17 "s2.g" "6.0" { xfail *-*-* } } } */
>    asm volatile (NOP : : : "memory");	/* { dg-final { gdb-test pr59776.c:20 "s1.f" "5.0" } } */
>    asm volatile (NOP : : : "memory");	/* { dg-final { gdb-test pr59776.c:20 "s1.g" "6.0" } } */
>    s2 = s1;				/* { dg-final { gdb-test pr59776.c:20 "s2.f" "5.0" } } */
> -  asm volatile (NOP : : : "memory");	/* { dg-final { gdb-test pr59776.c:20 "s2.g" "6.0" } } */
> +  asm volatile (NOP : : : "memory");	/* { dg-final { gdb-test pr59776.c:20 "s2.g" "6.0" { xfail *-*-* } } } */
>    asm volatile (NOP : : : "memory");
>  }
>  
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr92706-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr92706-2.c
> new file mode 100644
> index 00000000000..37ab9765db0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr92706-2.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-esra" } */
> +
> +typedef __UINT64_TYPE__ uint64_t;
> +typedef __UINT32_TYPE__ uint32_t;
> +struct S { uint32_t i[2]; } __attribute__((aligned(__alignof__(uint64_t))));
> +typedef uint64_t my_int64 __attribute__((may_alias));
> +uint64_t load (void *p)
> +{
> +  struct S u, v, w;
> +  uint64_t tem;
> +  tem = *(my_int64 *)p;
> +  *(my_int64 *)&v = tem;
> +  u = v;
> +  w = u;
> +  return *(my_int64 *)&w;
> +}
> +
> +/* { dg-final { scan-tree-dump "Created a replacement for v" "esra" } } */
> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> index 36106fecaf1..2b0849858de 100644
> --- a/gcc/tree-sra.c
> +++ b/gcc/tree-sra.c
> @@ -211,8 +211,11 @@ struct access
>       is not propagated in the access tree in any direction.  */
>    unsigned grp_scalar_write : 1;
>  
> -  /* Is this access an artificial one created to scalarize some record
> -     entirely? */
> +  /* In a root of an access tree, true means that the entire tree should be
> +     totally scalarized - that all scalar leafs should be scalarized and
> +     non-root grp_total_scalarization accesses should be honored.  Otherwise,
> +     non-root accesses with grp_total_scalarization should never get scalar
> +     replacements.  */
>    unsigned grp_total_scalarization : 1;
>  
>    /* Other passes of the analysis use this bit to make function
> @@ -485,6 +488,15 @@ find_access_in_subtree (struct access *access, HOST_WIDE_INT offset,
>        access = child;
>      }
>  
> +  /* Total scalarization does not replace single field structures with their
> +     single field but rather creates an access for them underneath.  Look for
> +     it.  */
> +  if (access)
> +    while (access->first_child
> +	   && access->first_child->offset == offset
> +	   && access->first_child->size == size)
> +      access = access->first_child;
> +
>    return access;
>  }
>  
> @@ -856,7 +868,8 @@ create_access (tree expr, gimple *stmt, bool write)
>  static bool
>  scalarizable_type_p (tree type, bool const_decl)
>  {
> -  gcc_assert (!is_gimple_reg_type (type));
> +  if (is_gimple_reg_type (type))
> +    return true;
>    if (type_contains_placeholder_p (type))
>      return false;
>  
> @@ -871,8 +884,7 @@ scalarizable_type_p (tree type, bool const_decl)
>  	  if (DECL_BIT_FIELD (fld))
>  	    return false;
>  
> -	  if (!is_gimple_reg_type (ft)
> -	      && !scalarizable_type_p (ft, const_decl))
> +	  if (!scalarizable_type_p (ft, const_decl))
>  	    return false;
>  	}
>  
> @@ -902,8 +914,7 @@ scalarizable_type_p (tree type, bool const_decl)
>  	return false;
>  
>        tree elem = TREE_TYPE (type);
> -      if (!is_gimple_reg_type (elem)
> -	  && !scalarizable_type_p (elem, const_decl))
> +      if (!scalarizable_type_p (elem, const_decl))
>  	return false;
>        return true;
>      }
> @@ -912,114 +923,6 @@ scalarizable_type_p (tree type, bool const_decl)
>    }
>  }
>  
> -static void scalarize_elem (tree, HOST_WIDE_INT, HOST_WIDE_INT, bool, tree, tree);
> -
> -/* Create total_scalarization accesses for all scalar fields of a member
> -   of type DECL_TYPE conforming to scalarizable_type_p.  BASE
> -   must be the top-most VAR_DECL representing the variable; within that,
> -   OFFSET locates the member and REF must be the memory reference expression for
> -   the member.  */
> -
> -static void
> -completely_scalarize (tree base, tree decl_type, HOST_WIDE_INT offset, tree ref)
> -{
> -  switch (TREE_CODE (decl_type))
> -    {
> -    case RECORD_TYPE:
> -      for (tree fld = TYPE_FIELDS (decl_type); fld; fld = DECL_CHAIN (fld))
> -	if (TREE_CODE (fld) == FIELD_DECL)
> -	  {
> -	    HOST_WIDE_INT pos = offset + int_bit_position (fld);
> -	    tree ft = TREE_TYPE (fld);
> -	    tree nref = build3 (COMPONENT_REF, ft, ref, fld, NULL_TREE);
> -
> -	    scalarize_elem (base, pos, tree_to_uhwi (DECL_SIZE (fld)),
> -			    TYPE_REVERSE_STORAGE_ORDER (decl_type),
> -			    nref, ft);
> -	  }
> -      break;
> -    case ARRAY_TYPE:
> -      {
> -	tree elemtype = TREE_TYPE (decl_type);
> -	tree elem_size = TYPE_SIZE (elemtype);
> -	gcc_assert (elem_size && tree_fits_shwi_p (elem_size));
> -	HOST_WIDE_INT el_size = tree_to_shwi (elem_size);
> -	gcc_assert (el_size > 0);
> -
> -	tree minidx = TYPE_MIN_VALUE (TYPE_DOMAIN (decl_type));
> -	gcc_assert (TREE_CODE (minidx) == INTEGER_CST);
> -	tree maxidx = TYPE_MAX_VALUE (TYPE_DOMAIN (decl_type));
> -	/* Skip (some) zero-length arrays; others have MAXIDX == MINIDX - 1.  */
> -	if (maxidx)
> -	  {
> -	    gcc_assert (TREE_CODE (maxidx) == INTEGER_CST);
> -	    tree domain = TYPE_DOMAIN (decl_type);
> -	    /* MINIDX and MAXIDX are inclusive, and must be interpreted in
> -	       DOMAIN (e.g. signed int, whereas min/max may be size_int).  */
> -	    offset_int idx = wi::to_offset (minidx);
> -	    offset_int max = wi::to_offset (maxidx);
> -	    if (!TYPE_UNSIGNED (domain))
> -	      {
> -		idx = wi::sext (idx, TYPE_PRECISION (domain));
> -		max = wi::sext (max, TYPE_PRECISION (domain));
> -	      }
> -	    for (int el_off = offset; idx <= max; ++idx)
> -	      {
> -		tree nref = build4 (ARRAY_REF, elemtype,
> -				    ref,
> -				    wide_int_to_tree (domain, idx),
> -				    NULL_TREE, NULL_TREE);
> -		scalarize_elem (base, el_off, el_size,
> -				TYPE_REVERSE_STORAGE_ORDER (decl_type),
> -				nref, elemtype);
> -		el_off += el_size;
> -	      }
> -	  }
> -      }
> -      break;
> -    default:
> -      gcc_unreachable ();
> -    }
> -}
> -
> -/* Create total_scalarization accesses for a member of type TYPE, which must
> -   satisfy either is_gimple_reg_type or scalarizable_type_p.  BASE must be the
> -   top-most VAR_DECL representing the variable; within that, POS and SIZE locate
> -   the member, REVERSE gives its torage order. and REF must be the reference
> -   expression for it.  */
> -
> -static void
> -scalarize_elem (tree base, HOST_WIDE_INT pos, HOST_WIDE_INT size, bool reverse,
> -		tree ref, tree type)
> -{
> -  if (is_gimple_reg_type (type))
> -  {
> -    struct access *access = create_access_1 (base, pos, size);
> -    access->expr = ref;
> -    access->type = type;
> -    access->grp_total_scalarization = 1;
> -    access->reverse = reverse;
> -    /* Accesses for intraprocedural SRA can have their stmt NULL.  */
> -  }
> -  else
> -    completely_scalarize (base, type, pos, ref);
> -}
> -
> -/* Create a total_scalarization access for VAR as a whole.  VAR must be of a
> -   RECORD_TYPE or ARRAY_TYPE conforming to scalarizable_type_p.  */
> -
> -static void
> -create_total_scalarization_access (tree var)
> -{
> -  HOST_WIDE_INT size = tree_to_uhwi (DECL_SIZE (var));
> -  struct access *access;
> -
> -  access = create_access_1 (var, 0, size);
> -  access->expr = var;
> -  access->type = TREE_TYPE (var);
> -  access->grp_total_scalarization = 1;
> -}
> -
>  /* Return true if REF has an VIEW_CONVERT_EXPR somewhere in it.  */
>  
>  static inline bool
> @@ -2029,7 +1932,6 @@ sort_and_splice_var_accesses (tree var)
>        bool grp_assignment_read = access->grp_assignment_read;
>        bool grp_assignment_write = access->grp_assignment_write;
>        bool multiple_scalar_reads = false;
> -      bool total_scalarization = access->grp_total_scalarization;
>        bool grp_partial_lhs = access->grp_partial_lhs;
>        bool first_scalar = is_gimple_reg_type (access->type);
>        bool unscalarizable_region = access->grp_unscalarizable_region;
> @@ -2081,7 +1983,6 @@ sort_and_splice_var_accesses (tree var)
>  	  grp_assignment_write |= ac2->grp_assignment_write;
>  	  grp_partial_lhs |= ac2->grp_partial_lhs;
>  	  unscalarizable_region |= ac2->grp_unscalarizable_region;
> -	  total_scalarization |= ac2->grp_total_scalarization;
>  	  relink_to_new_repr (access, ac2);
>  
>  	  /* If there are both aggregate-type and scalar-type accesses with
> @@ -2122,9 +2023,7 @@ sort_and_splice_var_accesses (tree var)
>        access->grp_scalar_write = grp_scalar_write;
>        access->grp_assignment_read = grp_assignment_read;
>        access->grp_assignment_write = grp_assignment_write;
> -      access->grp_hint = total_scalarization
> -	|| (multiple_scalar_reads && !constant_decl_p (var));
> -      access->grp_total_scalarization = total_scalarization;
> +      access->grp_hint = multiple_scalar_reads && !constant_decl_p (var);
>        access->grp_partial_lhs = grp_partial_lhs;
>        access->grp_unscalarizable_region = unscalarizable_region;
>        access->grp_same_access_path = grp_same_access_path;
> @@ -2420,15 +2319,16 @@ expr_with_var_bounded_array_refs_p (tree expr)
>  }
>  
>  /* Analyze the subtree of accesses rooted in ROOT, scheduling replacements when
> -   both seeming beneficial and when ALLOW_REPLACEMENTS allows it.  Also set all
> -   sorts of access flags appropriately along the way, notably always set
> -   grp_read and grp_assign_read according to MARK_READ and grp_write when
> -   MARK_WRITE is true.
> +   both seeming beneficial and when ALLOW_REPLACEMENTS allows it.  If TOTALLY
> +   is set, we are totally scalarizing the aggregate.  Also set all sorts of
> +   access flags appropriately along the way, notably always set grp_read and
> +   grp_assign_read according to MARK_READ and grp_write when MARK_WRITE is
> +   true.
>  
>     Creating a replacement for a scalar access is considered beneficial if its
> -   grp_hint is set (this means we are either attempting total scalarization or
> -   there is more than one direct read access) or according to the following
> -   table:
> +   grp_hint ot TOTALLY is set (this means either that there is more than one
> +   direct read access or that we are attempting total scalarization) or
> +   according to the following table:
>  
>     Access written to through a scalar type (once or more times)
>     |
> @@ -2459,7 +2359,7 @@ expr_with_var_bounded_array_refs_p (tree expr)
>  
>  static bool
>  analyze_access_subtree (struct access *root, struct access *parent,
> -			bool allow_replacements)
> +			bool allow_replacements, bool totally)
>  {
>    struct access *child;
>    HOST_WIDE_INT limit = root->offset + root->size;
> @@ -2477,8 +2377,6 @@ analyze_access_subtree (struct access *root, struct access *parent,
>  	root->grp_write = 1;
>        if (parent->grp_assignment_write)
>  	root->grp_assignment_write = 1;
> -      if (parent->grp_total_scalarization)
> -	root->grp_total_scalarization = 1;
>        if (!parent->grp_same_access_path)
>  	root->grp_same_access_path = 0;
>      }
> @@ -2493,10 +2391,10 @@ analyze_access_subtree (struct access *root, struct access *parent,
>      {
>        hole |= covered_to < child->offset;
>        sth_created |= analyze_access_subtree (child, root,
> -					     allow_replacements && !scalar);
> +					     allow_replacements && !scalar,
> +					     totally);
>  
>        root->grp_unscalarized_data |= child->grp_unscalarized_data;
> -      root->grp_total_scalarization &= child->grp_total_scalarization;
>        if (child->grp_covered)
>  	covered_to += child->size;
>        else
> @@ -2504,7 +2402,9 @@ analyze_access_subtree (struct access *root, struct access *parent,
>      }
>  
>    if (allow_replacements && scalar && !root->first_child
> -      && (root->grp_hint
> +      && (totally || !root->grp_total_scalarization)
> +      && (totally
> +	  || root->grp_hint
>  	  || ((root->grp_scalar_read || root->grp_assignment_read)
>  	      && (root->grp_scalar_write || root->grp_assignment_write))))
>      {
> @@ -2546,6 +2446,7 @@ analyze_access_subtree (struct access *root, struct access *parent,
>      {
>        if (allow_replacements
>  	  && scalar && !root->first_child
> +	  && !root->grp_total_scalarization
>  	  && (root->grp_scalar_write || root->grp_assignment_write)
>  	  && !bitmap_bit_p (cannot_scalarize_away_bitmap,
>  			    DECL_UID (root->base)))
> @@ -2566,7 +2467,7 @@ analyze_access_subtree (struct access *root, struct access *parent,
>  	root->grp_total_scalarization = 0;
>      }
>  
> -  if (!hole || root->grp_total_scalarization)
> +  if (!hole || totally)
>      root->grp_covered = 1;
>    else if (root->grp_write || comes_initialized_p (root->base))
>      root->grp_unscalarized_data = 1; /* not covered and written to */
> @@ -2582,7 +2483,8 @@ analyze_access_trees (struct access *access)
>  
>    while (access)
>      {
> -      if (analyze_access_subtree (access, NULL, true))
> +      if (analyze_access_subtree (access, NULL, true,
> +				  access->grp_total_scalarization))
>  	ret = true;
>        access = access->next_grp;
>      }
> @@ -2855,6 +2757,369 @@ propagate_all_subaccesses (void)
>      }
>  }
>  
> +/* Return true if the forest beginning with ROOT does not contain
> +   unscalarizable regions or non-byte aligned accesses.  */
> +
> +static bool
> +can_totally_scalarize_forest_p (struct access *root)
> +{
> +  struct access *access = root;
> +  do
> +    {
> +      if (access->grp_unscalarizable_region
> +	  || (access->offset % BITS_PER_UNIT) != 0
> +	  || (access->size % BITS_PER_UNIT) != 0
> +	  || (is_gimple_reg_type (access->type)
> +	      && access->first_child))
> +	return false;
> +
> +      if (access->first_child)
> +	access = access->first_child;
> +      else if (access->next_sibling)
> +	access = access->next_sibling;
> +      else
> +	{
> +	  while (access->parent && !access->next_sibling)
> +	    access = access->parent;
> +	  if (access->next_sibling)
> +	    access = access->next_sibling;
> +	  else
> +	    {
> +	      gcc_assert (access == root);
> +	      root = root->next_grp;
> +	      access = root;
> +	    }
> +	}
> +    }
> +  while (access);
> +  return true;
> +}
> +
> +/* Create and return an ACCESS in PARENT spanning from POS with SIZE, TYPE and
> +   reference EXPR for total scalarization purposes and mark it as such.  Within
> +   the children of PARENT, link it in between PTR and NEXT_SIBLING.  */
> +
> +static struct access *
> +create_total_scalarization_access (struct access *parent, HOST_WIDE_INT pos,
> +				   HOST_WIDE_INT size, tree type, tree expr,
> +				   struct access **ptr,
> +				   struct access *next_sibling)
> +{
> +  struct access *access = access_pool.allocate ();
> +  memset (access, 0, sizeof (struct access));
> +  access->base = parent->base;
> +  access->offset = pos;
> +  access->size = size;
> +  access->expr = expr;
> +  access->type = type;
> +  access->parent = parent;
> +  access->grp_write = parent->grp_write;
> +  access->grp_total_scalarization = 1;
> +  access->grp_hint = 1;
> +  access->grp_same_access_path = path_comparable_for_same_access (expr);
> +  access->reverse = reverse_storage_order_for_component_p (expr);
> +
> +  access->next_sibling = next_sibling;
> +  *ptr = access;
> +  return access;
> +}
> +
> +/* Create and return an ACCESS in PARENT spanning from POS with SIZE, TYPE and
> +   reference EXPR for total scalarization purposes and mark it as such, link it
> +   at *PTR and reshape the tree so that those elements at *PTR and their
> +   siblings which fall within the part described by POS and SIZE are moved to
> +   be children of the new access.  If a partial overlap is detected, return
> +   NULL.  */
> +
> +static struct access *
> +create_total_access_and_reshape (struct access *parent, HOST_WIDE_INT pos,
> +				 HOST_WIDE_INT size, tree type, tree expr,
> +				 struct access **ptr)
> +{
> +  struct access **p = ptr;
> +
> +  while (*p && (*p)->offset < pos + size)
> +    {
> +      if ((*p)->offset + (*p)->size > pos + size)
> +	return NULL;
> +      p = &(*p)->next_sibling;
> +    }
> +
> +  struct access *next_child = *ptr;
> +  struct access *new_acc
> +    = create_total_scalarization_access (parent, pos, size, type, expr,
> +					 ptr, *p);
> +  if (p != ptr)
> +    {
> +      new_acc->first_child = next_child;
> +      *p = NULL;
> +      for (struct access *a = next_child; a; a = a->next_sibling)
> +	a->parent = new_acc;
> +    }
> +  return new_acc;
> +}
> +
> +static bool totally_scalarize_subtree (struct access *root);
> +
> +/* Return true if INNER is either the same type as OUTER or if it is the type
> +   of a record field in OUTER at offset zero, possibly in nested
> +   sub-records.  */
> +
> +static bool
> +access_and_field_type_match_p (tree outer, tree inner)
> +{
> +  if (TYPE_MAIN_VARIANT (outer) == TYPE_MAIN_VARIANT (inner))
> +    return true;
> +  if (TREE_CODE (outer) != RECORD_TYPE)
> +    return false;
> +  tree fld = TYPE_FIELDS (outer);
> +  while (fld)
> +    {
> +     if (TREE_CODE (fld) == FIELD_DECL)
> +       {
> +	if (!zerop (DECL_FIELD_OFFSET (fld)))
> +	  return false;
> +	if (TYPE_MAIN_VARIANT (TREE_TYPE (fld)) == inner)
> +	  return true;
> +	if (TREE_CODE (TREE_TYPE (fld)) == RECORD_TYPE)
> +	  fld = TYPE_FIELDS (TREE_TYPE (fld));
> +	else
> +	  return false;
> +       }
> +     else
> +       fld = DECL_CHAIN (fld);
> +    }
> +  return false;
> +}
> +
> +/* Return type of total_should_skip_creating_access indicating whether a total
> +   scalarization access for a field/element should be created, whether it
> +   already exists or whether the entire total scalarization has to fail.  */
> +
> +enum total_sra_field_state {TOTAL_FLD_CREATE, TOTAL_FLD_DONE, TOTAL_FLD_FAILED};
> +
> +/* Do all the necessary steps in total scalarization when the given aggregate
> +   type has a TYPE at POS with the given SIZE should be put into PARENT and
> +   when we have processed all its siblings with smaller offsets up until and
> +   including LAST_SEEN_SIBLING (which can be NULL).
> +
> +   If some further siblings are to be skipped, set *LAST_SEEN_SIBLING as
> +   appropriate.  Return TOTAL_FLD_CREATE id the caller should carry on with
> +   creating a new access, TOTAL_FLD_DONE if access or accesses capable of
> +   representing the described part of the aggregate for the purposes of total
> +   scalarization already exist or TOTAL_FLD_FAILED if there is a problem which
> +   prevents total scalarization from happening at all.  */
> +
> +static enum total_sra_field_state
> +total_should_skip_creating_access (struct access *parent,
> +				   struct access **last_seen_sibling,
> +				   tree type, HOST_WIDE_INT pos,
> +				   HOST_WIDE_INT size)
> +{
> +  struct access *next_child;
> +  if (!*last_seen_sibling)
> +    next_child = parent->first_child;
> +  else
> +    next_child = (*last_seen_sibling)->next_sibling;
> +
> +  /* First, traverse the chain of siblings until it points to an access with
> +     offset at least equal to POS.  Check all skipped accesses whether they
> +     span the POS boundary and if so, return with a failure.  */
> +  while (next_child && next_child->offset < pos)
> +    {
> +      if (next_child->offset + next_child->size > pos)
> +	return TOTAL_FLD_FAILED;
> +      *last_seen_sibling = next_child;
> +      next_child = next_child->next_sibling;
> +    }
> +
> +  /* Now check whether next_child has exactly the right POS and SIZE and if so,
> +     whether it can represent what we need and can be totally scalarized
> +     itself.  */
> +  if (next_child && next_child->offset == pos
> +      && next_child->size == size)
> +    {
> +      if (!is_gimple_reg_type (next_child->type)
> +	  && (!access_and_field_type_match_p (type, next_child->type)
> +	      || !totally_scalarize_subtree (next_child)))
> +	return TOTAL_FLD_FAILED;
> +
> +      *last_seen_sibling = next_child;
> +      return TOTAL_FLD_DONE;
> +    }
> +
> +  /* If the child we're looking at would partially overlap, we just cannot
> +     totally scalarize.  */
> +  if (next_child
> +      && next_child->offset < pos + size
> +      && next_child->offset + next_child->size > pos + size)
> +    return TOTAL_FLD_FAILED;
> +
> +  if (is_gimple_reg_type (type))
> +    {
> +      /* We don't scalarize accesses that are children of other scalar type
> +	 accesses, so if we go on and create an access for a register type,
> +	 there should not be any pre-existing children.  There are rare cases
> +	 where the requested type is a vector but we already have register
> +	 accesses for all its elements which is equally good.  Detect that
> +	 situation or whether we need to bail out.  */
> +
> +      HOST_WIDE_INT covered = pos;
> +      bool skipping = false;
> +      while (next_child
> +	     && next_child->offset + next_child->size <= pos + size)
> +	{
> +	  if (next_child->offset != covered
> +	      || !is_gimple_reg_type (next_child->type))
> +	    return TOTAL_FLD_FAILED;
> +
> +	  covered += next_child->size;
> +	  *last_seen_sibling = next_child;
> +	  next_child = next_child->next_sibling;
> +	  skipping = true;
> +	}
> +
> +      if (skipping)
> +	{
> +	  if (covered != pos + size)
> +	    return TOTAL_FLD_FAILED;
> +	  else
> +	    return TOTAL_FLD_DONE;
> +	}
> +    }
> +
> +  return TOTAL_FLD_CREATE;
> +}
> +
> +/* Go over sub-tree rooted in ROOT and attempt to create scalar accesses
> +   spanning all uncovered areas covered by ROOT, return false if the attempt
> +   failed.  All created accesses will have grp_unscalarizable_region set (and
> +   should be ignored if the function returns false).  */
> +
> +static bool
> +totally_scalarize_subtree (struct access *root)
> +{
> +  gcc_checking_assert (!root->grp_unscalarizable_region);
> +  gcc_checking_assert (!is_gimple_reg_type (root->type));
> +
> +  struct access *last_seen_sibling = NULL;
> +
> +  switch (TREE_CODE (root->type))
> +    {
> +    case RECORD_TYPE:
> +      for (tree fld = TYPE_FIELDS (root->type); fld; fld = DECL_CHAIN (fld))
> +	if (TREE_CODE (fld) == FIELD_DECL)
> +	  {
> +	    tree ft = TREE_TYPE (fld);
> +	    HOST_WIDE_INT fsize = tree_to_uhwi (DECL_SIZE (fld));
> +	    if (!fsize)
> +	      continue;
> +
> +	    HOST_WIDE_INT pos = root->offset + int_bit_position (fld);
> +	    enum total_sra_field_state
> +	      state = total_should_skip_creating_access (root,
> +							 &last_seen_sibling,
> +							 ft, pos, fsize);
> +	    switch (state)
> +	      {
> +	      case TOTAL_FLD_FAILED:
> +		return false;
> +	      case TOTAL_FLD_DONE:
> +		continue;
> +	      case TOTAL_FLD_CREATE:
> +		break;
> +	      default:
> +		gcc_unreachable ();
> +	      }
> +
> +	    struct access **p = (last_seen_sibling
> +				 ? &last_seen_sibling->next_sibling
> +				 : &root->first_child);
> +	    tree nref = build3 (COMPONENT_REF, ft, root->expr, fld, NULL_TREE);
> +	    struct access *new_child
> +	      = create_total_access_and_reshape (root, pos, fsize, ft, nref, p);
> +	    if (!new_child)
> +	      return false;
> +
> +	    if (!is_gimple_reg_type (ft)
> +		&& !totally_scalarize_subtree (new_child))
> +	      return false;
> +	    last_seen_sibling = new_child;
> +	  }
> +      break;
> +    case ARRAY_TYPE:
> +      {
> +	tree elemtype = TREE_TYPE (root->type);
> +	tree elem_size = TYPE_SIZE (elemtype);
> +	gcc_assert (elem_size && tree_fits_shwi_p (elem_size));
> +	HOST_WIDE_INT el_size = tree_to_shwi (elem_size);
> +	gcc_assert (el_size > 0);
> +
> +	tree minidx = TYPE_MIN_VALUE (TYPE_DOMAIN (root->type));
> +	gcc_assert (TREE_CODE (minidx) == INTEGER_CST);
> +	tree maxidx = TYPE_MAX_VALUE (TYPE_DOMAIN (root->type));
> +	/* Skip (some) zero-length arrays; others have MAXIDX == MINIDX - 1.  */
> +	if (!maxidx)
> +	  goto out;
> +	gcc_assert (TREE_CODE (maxidx) == INTEGER_CST);
> +	tree domain = TYPE_DOMAIN (root->type);
> +	/* MINIDX and MAXIDX are inclusive, and must be interpreted in
> +	   DOMAIN (e.g. signed int, whereas min/max may be size_int).  */
> +	offset_int idx = wi::to_offset (minidx);
> +	offset_int max = wi::to_offset (maxidx);
> +	if (!TYPE_UNSIGNED (domain))
> +	  {
> +	    idx = wi::sext (idx, TYPE_PRECISION (domain));
> +	    max = wi::sext (max, TYPE_PRECISION (domain));
> +	  }
> +	for (HOST_WIDE_INT pos = root->offset;
> +	     idx <= max;
> +	     pos += el_size, ++idx)
> +	  {
> +	    enum total_sra_field_state
> +	      state = total_should_skip_creating_access (root,
> +							 &last_seen_sibling,
> +							 elemtype, pos,
> +							 el_size);
> +	    switch (state)
> +	      {
> +	      case TOTAL_FLD_FAILED:
> +		return false;
> +	      case TOTAL_FLD_DONE:
> +		continue;
> +	      case TOTAL_FLD_CREATE:
> +		break;
> +	      default:
> +		gcc_unreachable ();
> +	      }
> +
> +	    struct access **p = (last_seen_sibling
> +				 ? &last_seen_sibling->next_sibling
> +				 : &root->first_child);
> +	    tree nref = build4 (ARRAY_REF, elemtype, root->expr,
> +				wide_int_to_tree (domain, idx),
> +				NULL_TREE, NULL_TREE);
> +	    struct access *new_child
> +	      = create_total_access_and_reshape (root, pos, el_size, elemtype,
> +						 nref, p);
> +	    if (!new_child)
> +	      return false;
> +
> +	    if (!is_gimple_reg_type (elemtype)
> +		&& !totally_scalarize_subtree (new_child))
> +	      return false;
> +	    last_seen_sibling = new_child;
> +	  }
> +      }
> +      break;
> +    default:
> +      gcc_unreachable ();
> +    }
> +
> + out:
> +  return true;
> +}
> +
>  /* Go through all accesses collected throughout the (intraprocedural) analysis
>     stage, exclude overlapping ones, identify representatives and build trees
>     out of them, making decisions about scalarization on the way.  Return true
> @@ -2867,8 +3132,22 @@ analyze_all_variable_accesses (void)
>    bitmap tmp = BITMAP_ALLOC (NULL);
>    bitmap_iterator bi;
>    unsigned i;
> -  bool optimize_speed_p = !optimize_function_for_size_p (cfun);
>  
> +  bitmap_copy (tmp, candidate_bitmap);
> +  EXECUTE_IF_SET_IN_BITMAP (tmp, 0, i, bi)
> +    {
> +      tree var = candidate (i);
> +      struct access *access;
> +
> +      access = sort_and_splice_var_accesses (var);
> +      if (!access || !build_access_trees (access))
> +	disqualify_candidate (var,
> +			      "No or inhibitingly overlapping accesses.");
> +    }
> +
> +  propagate_all_subaccesses ();
> +
> +  bool optimize_speed_p = !optimize_function_for_size_p (cfun);
>    /* If the user didn't set PARAM_SRA_MAX_SCALARIZATION_SIZE_<...>,
>       fall back to a target default.  */
>    unsigned HOST_WIDE_INT max_scalarization_size
> @@ -2884,7 +3163,6 @@ analyze_all_variable_accesses (void)
>        if (global_options_set.x_param_sra_max_scalarization_size_size)
>  	max_scalarization_size = param_sra_max_scalarization_size_size;
>      }
> -
>    max_scalarization_size *= BITS_PER_UNIT;
>  
>    EXECUTE_IF_SET_IN_BITMAP (candidate_bitmap, 0, i, bi)
> @@ -2892,46 +3170,56 @@ analyze_all_variable_accesses (void)
>  	&& !bitmap_bit_p (cannot_scalarize_away_bitmap, i))
>        {
>  	tree var = candidate (i);
> +	if (!VAR_P (var))
> +	  continue;
>  
> -	if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var),
> -						constant_decl_p (var)))
> +	if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var))) > max_scalarization_size)
>  	  {
> -	    if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var)))
> -		<= max_scalarization_size)
> -	      {
> -		create_total_scalarization_access (var);
> -		completely_scalarize (var, TREE_TYPE (var), 0, var);
> -		statistics_counter_event (cfun,
> -					  "Totally-scalarized aggregates", 1);
> -		if (dump_file && (dump_flags & TDF_DETAILS))
> -		  {
> -		    fprintf (dump_file, "Will attempt to totally scalarize ");
> -		    print_generic_expr (dump_file, var);
> -		    fprintf (dump_file, " (UID: %u): \n", DECL_UID (var));
> -		  }
> -	      }
> -	    else if (dump_file && (dump_flags & TDF_DETAILS))
> +	    if (dump_file && (dump_flags & TDF_DETAILS))
>  	      {
>  		fprintf (dump_file, "Too big to totally scalarize: ");
>  		print_generic_expr (dump_file, var);
>  		fprintf (dump_file, " (UID: %u)\n", DECL_UID (var));
>  	      }
> +	    continue;
>  	  }
> -      }
>  
> -  bitmap_copy (tmp, candidate_bitmap);
> -  EXECUTE_IF_SET_IN_BITMAP (tmp, 0, i, bi)
> -    {
> -      tree var = candidate (i);
> -      struct access *access;
> +	bool all_types_ok = true;
> +	for (struct access *access = get_first_repr_for_decl (var);
> +	     access;
> +	     access = access->next_grp)
> +	  if (!can_totally_scalarize_forest_p (access)
> +	      || !scalarizable_type_p (access->type, constant_decl_p (var)))
> +	    {
> +	      all_types_ok = false;
> +	      break;
> +	    }
> +	if (!all_types_ok)
> +	  continue;
>  
> -      access = sort_and_splice_var_accesses (var);
> -      if (!access || !build_access_trees (access))
> -	disqualify_candidate (var,
> -			      "No or inhibitingly overlapping accesses.");
> -    }
> +	if (dump_file && (dump_flags & TDF_DETAILS))
> +	  {
> +	    fprintf (dump_file, "Will attempt to totally scalarize ");
> +	    print_generic_expr (dump_file, var);
> +	    fprintf (dump_file, " (UID: %u): \n", DECL_UID (var));
> +	  }
> +	bool scalarized = true;
> +	for (struct access *access = get_first_repr_for_decl (var);
> +	     access;
> +	     access = access->next_grp)
> +	  if (!is_gimple_reg_type (access->type)
> +	      && !totally_scalarize_subtree (access))
> +	    {
> +	      scalarized = false;
> +	      break;
> +	    }
>  
> -  propagate_all_subaccesses ();
> +	if (scalarized)
> +	  for (struct access *access = get_first_repr_for_decl (var);
> +	       access;
> +	       access = access->next_grp)
> +	    access->grp_total_scalarization = true;
> +      }
>  
>    if (flag_checking)
>      verify_all_sra_access_forests ();
> @@ -3804,25 +4092,39 @@ initialize_constant_pool_replacements (void)
>        tree var = candidate (i);
>        if (!constant_decl_p (var))
>  	continue;
> -      vec<access_p> *access_vec = get_base_access_vector (var);
> -      if (!access_vec)
> -	continue;
> -      for (unsigned i = 0; i < access_vec->length (); i++)
> +
> +      struct access *access = get_first_repr_for_decl (var);
> +
> +      while (access)
>  	{
> -	  struct access *access = (*access_vec)[i];
> -	  if (!access->replacement_decl)
> -	    continue;
> -	  gassign *stmt
> -	    = gimple_build_assign (get_access_replacement (access),
> -				   unshare_expr (access->expr));
> -	  if (dump_file && (dump_flags & TDF_DETAILS))
> +	  if (access->replacement_decl)
>  	    {
> -	      fprintf (dump_file, "Generating constant initializer: ");
> -	      print_gimple_stmt (dump_file, stmt, 0);
> -	      fprintf (dump_file, "\n");
> +	      gassign *stmt
> +		= gimple_build_assign (get_access_replacement (access),
> +				       unshare_expr (access->expr));
> +	      if (dump_file && (dump_flags & TDF_DETAILS))
> +		{
> +		  fprintf (dump_file, "Generating constant initializer: ");
> +		  print_gimple_stmt (dump_file, stmt, 0);
> +		  fprintf (dump_file, "\n");
> +		}
> +	      gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
> +	      update_stmt (stmt);
> +	    }
> +
> +	  if (access->first_child)
> +	    access = access->first_child;
> +	  else if (access->next_sibling)
> +	    access = access->next_sibling;
> +	  else
> +	    {
> +	      while (access->parent && !access->next_sibling)
> +		access = access->parent;
> +	      if (access->next_sibling)
> +		access = access->next_sibling;
> +	      else
> +		access = access->next_grp;
>  	    }
> -	  gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
> -	  update_stmt (stmt);
>  	}
>      }
>  
>
diff mbox series

Patch

diff --git a/gcc/testsuite/gcc.dg/guality/pr59776.c b/gcc/testsuite/gcc.dg/guality/pr59776.c
index 382abb622bb..6c1c8165b70 100644
--- a/gcc/testsuite/gcc.dg/guality/pr59776.c
+++ b/gcc/testsuite/gcc.dg/guality/pr59776.c
@@ -12,11 +12,11 @@  foo (struct S *p)
   struct S s1, s2;			/* { dg-final { gdb-test pr59776.c:17 "s1.f" "5.0" } } */
   s1 = *p;				/* { dg-final { gdb-test pr59776.c:17 "s1.g" "6.0" } } */
   s2 = s1;				/* { dg-final { gdb-test pr59776.c:17 "s2.f" "0.0" } } */
-  *(int *) &s2.f = 0;			/* { dg-final { gdb-test pr59776.c:17 "s2.g" "6.0" } } */
+  *(int *) &s2.f = 0;			/* { dg-final { gdb-test pr59776.c:17 "s2.g" "6.0" { xfail *-*-* } } } */
   asm volatile (NOP : : : "memory");	/* { dg-final { gdb-test pr59776.c:20 "s1.f" "5.0" } } */
   asm volatile (NOP : : : "memory");	/* { dg-final { gdb-test pr59776.c:20 "s1.g" "6.0" } } */
   s2 = s1;				/* { dg-final { gdb-test pr59776.c:20 "s2.f" "5.0" } } */
-  asm volatile (NOP : : : "memory");	/* { dg-final { gdb-test pr59776.c:20 "s2.g" "6.0" } } */
+  asm volatile (NOP : : : "memory");	/* { dg-final { gdb-test pr59776.c:20 "s2.g" "6.0" { xfail *-*-* } } } */
   asm volatile (NOP : : : "memory");
 }
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr92706-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr92706-2.c
new file mode 100644
index 00000000000..37ab9765db0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr92706-2.c
@@ -0,0 +1,19 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-esra" } */
+
+typedef __UINT64_TYPE__ uint64_t;
+typedef __UINT32_TYPE__ uint32_t;
+struct S { uint32_t i[2]; } __attribute__((aligned(__alignof__(uint64_t))));
+typedef uint64_t my_int64 __attribute__((may_alias));
+uint64_t load (void *p)
+{
+  struct S u, v, w;
+  uint64_t tem;
+  tem = *(my_int64 *)p;
+  *(my_int64 *)&v = tem;
+  u = v;
+  w = u;
+  return *(my_int64 *)&w;
+}
+
+/* { dg-final { scan-tree-dump "Created a replacement for v" "esra" } } */
diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index 36106fecaf1..2b0849858de 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -211,8 +211,11 @@  struct access
      is not propagated in the access tree in any direction.  */
   unsigned grp_scalar_write : 1;
 
-  /* Is this access an artificial one created to scalarize some record
-     entirely? */
+  /* In a root of an access tree, true means that the entire tree should be
+     totally scalarized - that all scalar leafs should be scalarized and
+     non-root grp_total_scalarization accesses should be honored.  Otherwise,
+     non-root accesses with grp_total_scalarization should never get scalar
+     replacements.  */
   unsigned grp_total_scalarization : 1;
 
   /* Other passes of the analysis use this bit to make function
@@ -485,6 +488,15 @@  find_access_in_subtree (struct access *access, HOST_WIDE_INT offset,
       access = child;
     }
 
+  /* Total scalarization does not replace single field structures with their
+     single field but rather creates an access for them underneath.  Look for
+     it.  */
+  if (access)
+    while (access->first_child
+	   && access->first_child->offset == offset
+	   && access->first_child->size == size)
+      access = access->first_child;
+
   return access;
 }
 
@@ -856,7 +868,8 @@  create_access (tree expr, gimple *stmt, bool write)
 static bool
 scalarizable_type_p (tree type, bool const_decl)
 {
-  gcc_assert (!is_gimple_reg_type (type));
+  if (is_gimple_reg_type (type))
+    return true;
   if (type_contains_placeholder_p (type))
     return false;
 
@@ -871,8 +884,7 @@  scalarizable_type_p (tree type, bool const_decl)
 	  if (DECL_BIT_FIELD (fld))
 	    return false;
 
-	  if (!is_gimple_reg_type (ft)
-	      && !scalarizable_type_p (ft, const_decl))
+	  if (!scalarizable_type_p (ft, const_decl))
 	    return false;
 	}
 
@@ -902,8 +914,7 @@  scalarizable_type_p (tree type, bool const_decl)
 	return false;
 
       tree elem = TREE_TYPE (type);
-      if (!is_gimple_reg_type (elem)
-	  && !scalarizable_type_p (elem, const_decl))
+      if (!scalarizable_type_p (elem, const_decl))
 	return false;
       return true;
     }
@@ -912,114 +923,6 @@  scalarizable_type_p (tree type, bool const_decl)
   }
 }
 
-static void scalarize_elem (tree, HOST_WIDE_INT, HOST_WIDE_INT, bool, tree, tree);
-
-/* Create total_scalarization accesses for all scalar fields of a member
-   of type DECL_TYPE conforming to scalarizable_type_p.  BASE
-   must be the top-most VAR_DECL representing the variable; within that,
-   OFFSET locates the member and REF must be the memory reference expression for
-   the member.  */
-
-static void
-completely_scalarize (tree base, tree decl_type, HOST_WIDE_INT offset, tree ref)
-{
-  switch (TREE_CODE (decl_type))
-    {
-    case RECORD_TYPE:
-      for (tree fld = TYPE_FIELDS (decl_type); fld; fld = DECL_CHAIN (fld))
-	if (TREE_CODE (fld) == FIELD_DECL)
-	  {
-	    HOST_WIDE_INT pos = offset + int_bit_position (fld);
-	    tree ft = TREE_TYPE (fld);
-	    tree nref = build3 (COMPONENT_REF, ft, ref, fld, NULL_TREE);
-
-	    scalarize_elem (base, pos, tree_to_uhwi (DECL_SIZE (fld)),
-			    TYPE_REVERSE_STORAGE_ORDER (decl_type),
-			    nref, ft);
-	  }
-      break;
-    case ARRAY_TYPE:
-      {
-	tree elemtype = TREE_TYPE (decl_type);
-	tree elem_size = TYPE_SIZE (elemtype);
-	gcc_assert (elem_size && tree_fits_shwi_p (elem_size));
-	HOST_WIDE_INT el_size = tree_to_shwi (elem_size);
-	gcc_assert (el_size > 0);
-
-	tree minidx = TYPE_MIN_VALUE (TYPE_DOMAIN (decl_type));
-	gcc_assert (TREE_CODE (minidx) == INTEGER_CST);
-	tree maxidx = TYPE_MAX_VALUE (TYPE_DOMAIN (decl_type));
-	/* Skip (some) zero-length arrays; others have MAXIDX == MINIDX - 1.  */
-	if (maxidx)
-	  {
-	    gcc_assert (TREE_CODE (maxidx) == INTEGER_CST);
-	    tree domain = TYPE_DOMAIN (decl_type);
-	    /* MINIDX and MAXIDX are inclusive, and must be interpreted in
-	       DOMAIN (e.g. signed int, whereas min/max may be size_int).  */
-	    offset_int idx = wi::to_offset (minidx);
-	    offset_int max = wi::to_offset (maxidx);
-	    if (!TYPE_UNSIGNED (domain))
-	      {
-		idx = wi::sext (idx, TYPE_PRECISION (domain));
-		max = wi::sext (max, TYPE_PRECISION (domain));
-	      }
-	    for (int el_off = offset; idx <= max; ++idx)
-	      {
-		tree nref = build4 (ARRAY_REF, elemtype,
-				    ref,
-				    wide_int_to_tree (domain, idx),
-				    NULL_TREE, NULL_TREE);
-		scalarize_elem (base, el_off, el_size,
-				TYPE_REVERSE_STORAGE_ORDER (decl_type),
-				nref, elemtype);
-		el_off += el_size;
-	      }
-	  }
-      }
-      break;
-    default:
-      gcc_unreachable ();
-    }
-}
-
-/* Create total_scalarization accesses for a member of type TYPE, which must
-   satisfy either is_gimple_reg_type or scalarizable_type_p.  BASE must be the
-   top-most VAR_DECL representing the variable; within that, POS and SIZE locate
-   the member, REVERSE gives its torage order. and REF must be the reference
-   expression for it.  */
-
-static void
-scalarize_elem (tree base, HOST_WIDE_INT pos, HOST_WIDE_INT size, bool reverse,
-		tree ref, tree type)
-{
-  if (is_gimple_reg_type (type))
-  {
-    struct access *access = create_access_1 (base, pos, size);
-    access->expr = ref;
-    access->type = type;
-    access->grp_total_scalarization = 1;
-    access->reverse = reverse;
-    /* Accesses for intraprocedural SRA can have their stmt NULL.  */
-  }
-  else
-    completely_scalarize (base, type, pos, ref);
-}
-
-/* Create a total_scalarization access for VAR as a whole.  VAR must be of a
-   RECORD_TYPE or ARRAY_TYPE conforming to scalarizable_type_p.  */
-
-static void
-create_total_scalarization_access (tree var)
-{
-  HOST_WIDE_INT size = tree_to_uhwi (DECL_SIZE (var));
-  struct access *access;
-
-  access = create_access_1 (var, 0, size);
-  access->expr = var;
-  access->type = TREE_TYPE (var);
-  access->grp_total_scalarization = 1;
-}
-
 /* Return true if REF has an VIEW_CONVERT_EXPR somewhere in it.  */
 
 static inline bool
@@ -2029,7 +1932,6 @@  sort_and_splice_var_accesses (tree var)
       bool grp_assignment_read = access->grp_assignment_read;
       bool grp_assignment_write = access->grp_assignment_write;
       bool multiple_scalar_reads = false;
-      bool total_scalarization = access->grp_total_scalarization;
       bool grp_partial_lhs = access->grp_partial_lhs;
       bool first_scalar = is_gimple_reg_type (access->type);
       bool unscalarizable_region = access->grp_unscalarizable_region;
@@ -2081,7 +1983,6 @@  sort_and_splice_var_accesses (tree var)
 	  grp_assignment_write |= ac2->grp_assignment_write;
 	  grp_partial_lhs |= ac2->grp_partial_lhs;
 	  unscalarizable_region |= ac2->grp_unscalarizable_region;
-	  total_scalarization |= ac2->grp_total_scalarization;
 	  relink_to_new_repr (access, ac2);
 
 	  /* If there are both aggregate-type and scalar-type accesses with
@@ -2122,9 +2023,7 @@  sort_and_splice_var_accesses (tree var)
       access->grp_scalar_write = grp_scalar_write;
       access->grp_assignment_read = grp_assignment_read;
       access->grp_assignment_write = grp_assignment_write;
-      access->grp_hint = total_scalarization
-	|| (multiple_scalar_reads && !constant_decl_p (var));
-      access->grp_total_scalarization = total_scalarization;
+      access->grp_hint = multiple_scalar_reads && !constant_decl_p (var);
       access->grp_partial_lhs = grp_partial_lhs;
       access->grp_unscalarizable_region = unscalarizable_region;
       access->grp_same_access_path = grp_same_access_path;
@@ -2420,15 +2319,16 @@  expr_with_var_bounded_array_refs_p (tree expr)
 }
 
 /* Analyze the subtree of accesses rooted in ROOT, scheduling replacements when
-   both seeming beneficial and when ALLOW_REPLACEMENTS allows it.  Also set all
-   sorts of access flags appropriately along the way, notably always set
-   grp_read and grp_assign_read according to MARK_READ and grp_write when
-   MARK_WRITE is true.
+   both seeming beneficial and when ALLOW_REPLACEMENTS allows it.  If TOTALLY
+   is set, we are totally scalarizing the aggregate.  Also set all sorts of
+   access flags appropriately along the way, notably always set grp_read and
+   grp_assign_read according to MARK_READ and grp_write when MARK_WRITE is
+   true.
 
    Creating a replacement for a scalar access is considered beneficial if its
-   grp_hint is set (this means we are either attempting total scalarization or
-   there is more than one direct read access) or according to the following
-   table:
+   grp_hint ot TOTALLY is set (this means either that there is more than one
+   direct read access or that we are attempting total scalarization) or
+   according to the following table:
 
    Access written to through a scalar type (once or more times)
    |
@@ -2459,7 +2359,7 @@  expr_with_var_bounded_array_refs_p (tree expr)
 
 static bool
 analyze_access_subtree (struct access *root, struct access *parent,
-			bool allow_replacements)
+			bool allow_replacements, bool totally)
 {
   struct access *child;
   HOST_WIDE_INT limit = root->offset + root->size;
@@ -2477,8 +2377,6 @@  analyze_access_subtree (struct access *root, struct access *parent,
 	root->grp_write = 1;
       if (parent->grp_assignment_write)
 	root->grp_assignment_write = 1;
-      if (parent->grp_total_scalarization)
-	root->grp_total_scalarization = 1;
       if (!parent->grp_same_access_path)
 	root->grp_same_access_path = 0;
     }
@@ -2493,10 +2391,10 @@  analyze_access_subtree (struct access *root, struct access *parent,
     {
       hole |= covered_to < child->offset;
       sth_created |= analyze_access_subtree (child, root,
-					     allow_replacements && !scalar);
+					     allow_replacements && !scalar,
+					     totally);
 
       root->grp_unscalarized_data |= child->grp_unscalarized_data;
-      root->grp_total_scalarization &= child->grp_total_scalarization;
       if (child->grp_covered)
 	covered_to += child->size;
       else
@@ -2504,7 +2402,9 @@  analyze_access_subtree (struct access *root, struct access *parent,
     }
 
   if (allow_replacements && scalar && !root->first_child
-      && (root->grp_hint
+      && (totally || !root->grp_total_scalarization)
+      && (totally
+	  || root->grp_hint
 	  || ((root->grp_scalar_read || root->grp_assignment_read)
 	      && (root->grp_scalar_write || root->grp_assignment_write))))
     {
@@ -2546,6 +2446,7 @@  analyze_access_subtree (struct access *root, struct access *parent,
     {
       if (allow_replacements
 	  && scalar && !root->first_child
+	  && !root->grp_total_scalarization
 	  && (root->grp_scalar_write || root->grp_assignment_write)
 	  && !bitmap_bit_p (cannot_scalarize_away_bitmap,
 			    DECL_UID (root->base)))
@@ -2566,7 +2467,7 @@  analyze_access_subtree (struct access *root, struct access *parent,
 	root->grp_total_scalarization = 0;
     }
 
-  if (!hole || root->grp_total_scalarization)
+  if (!hole || totally)
     root->grp_covered = 1;
   else if (root->grp_write || comes_initialized_p (root->base))
     root->grp_unscalarized_data = 1; /* not covered and written to */
@@ -2582,7 +2483,8 @@  analyze_access_trees (struct access *access)
 
   while (access)
     {
-      if (analyze_access_subtree (access, NULL, true))
+      if (analyze_access_subtree (access, NULL, true,
+				  access->grp_total_scalarization))
 	ret = true;
       access = access->next_grp;
     }
@@ -2855,6 +2757,369 @@  propagate_all_subaccesses (void)
     }
 }
 
+/* Return true if the forest beginning with ROOT does not contain
+   unscalarizable regions or non-byte aligned accesses.  */
+
+static bool
+can_totally_scalarize_forest_p (struct access *root)
+{
+  struct access *access = root;
+  do
+    {
+      if (access->grp_unscalarizable_region
+	  || (access->offset % BITS_PER_UNIT) != 0
+	  || (access->size % BITS_PER_UNIT) != 0
+	  || (is_gimple_reg_type (access->type)
+	      && access->first_child))
+	return false;
+
+      if (access->first_child)
+	access = access->first_child;
+      else if (access->next_sibling)
+	access = access->next_sibling;
+      else
+	{
+	  while (access->parent && !access->next_sibling)
+	    access = access->parent;
+	  if (access->next_sibling)
+	    access = access->next_sibling;
+	  else
+	    {
+	      gcc_assert (access == root);
+	      root = root->next_grp;
+	      access = root;
+	    }
+	}
+    }
+  while (access);
+  return true;
+}
+
+/* Create and return an ACCESS in PARENT spanning from POS with SIZE, TYPE and
+   reference EXPR for total scalarization purposes and mark it as such.  Within
+   the children of PARENT, link it in between PTR and NEXT_SIBLING.  */
+
+static struct access *
+create_total_scalarization_access (struct access *parent, HOST_WIDE_INT pos,
+				   HOST_WIDE_INT size, tree type, tree expr,
+				   struct access **ptr,
+				   struct access *next_sibling)
+{
+  struct access *access = access_pool.allocate ();
+  memset (access, 0, sizeof (struct access));
+  access->base = parent->base;
+  access->offset = pos;
+  access->size = size;
+  access->expr = expr;
+  access->type = type;
+  access->parent = parent;
+  access->grp_write = parent->grp_write;
+  access->grp_total_scalarization = 1;
+  access->grp_hint = 1;
+  access->grp_same_access_path = path_comparable_for_same_access (expr);
+  access->reverse = reverse_storage_order_for_component_p (expr);
+
+  access->next_sibling = next_sibling;
+  *ptr = access;
+  return access;
+}
+
+/* Create and return an ACCESS in PARENT spanning from POS with SIZE, TYPE and
+   reference EXPR for total scalarization purposes and mark it as such, link it
+   at *PTR and reshape the tree so that those elements at *PTR and their
+   siblings which fall within the part described by POS and SIZE are moved to
+   be children of the new access.  If a partial overlap is detected, return
+   NULL.  */
+
+static struct access *
+create_total_access_and_reshape (struct access *parent, HOST_WIDE_INT pos,
+				 HOST_WIDE_INT size, tree type, tree expr,
+				 struct access **ptr)
+{
+  struct access **p = ptr;
+
+  while (*p && (*p)->offset < pos + size)
+    {
+      if ((*p)->offset + (*p)->size > pos + size)
+	return NULL;
+      p = &(*p)->next_sibling;
+    }
+
+  struct access *next_child = *ptr;
+  struct access *new_acc
+    = create_total_scalarization_access (parent, pos, size, type, expr,
+					 ptr, *p);
+  if (p != ptr)
+    {
+      new_acc->first_child = next_child;
+      *p = NULL;
+      for (struct access *a = next_child; a; a = a->next_sibling)
+	a->parent = new_acc;
+    }
+  return new_acc;
+}
+
+static bool totally_scalarize_subtree (struct access *root);
+
+/* Return true if INNER is either the same type as OUTER or if it is the type
+   of a record field in OUTER at offset zero, possibly in nested
+   sub-records.  */
+
+static bool
+access_and_field_type_match_p (tree outer, tree inner)
+{
+  if (TYPE_MAIN_VARIANT (outer) == TYPE_MAIN_VARIANT (inner))
+    return true;
+  if (TREE_CODE (outer) != RECORD_TYPE)
+    return false;
+  tree fld = TYPE_FIELDS (outer);
+  while (fld)
+    {
+     if (TREE_CODE (fld) == FIELD_DECL)
+       {
+	if (!zerop (DECL_FIELD_OFFSET (fld)))
+	  return false;
+	if (TYPE_MAIN_VARIANT (TREE_TYPE (fld)) == inner)
+	  return true;
+	if (TREE_CODE (TREE_TYPE (fld)) == RECORD_TYPE)
+	  fld = TYPE_FIELDS (TREE_TYPE (fld));
+	else
+	  return false;
+       }
+     else
+       fld = DECL_CHAIN (fld);
+    }
+  return false;
+}
+
+/* Return type of total_should_skip_creating_access indicating whether a total
+   scalarization access for a field/element should be created, whether it
+   already exists or whether the entire total scalarization has to fail.  */
+
+enum total_sra_field_state {TOTAL_FLD_CREATE, TOTAL_FLD_DONE, TOTAL_FLD_FAILED};
+
+/* Do all the necessary steps in total scalarization when the given aggregate
+   type has a TYPE at POS with the given SIZE should be put into PARENT and
+   when we have processed all its siblings with smaller offsets up until and
+   including LAST_SEEN_SIBLING (which can be NULL).
+
+   If some further siblings are to be skipped, set *LAST_SEEN_SIBLING as
+   appropriate.  Return TOTAL_FLD_CREATE id the caller should carry on with
+   creating a new access, TOTAL_FLD_DONE if access or accesses capable of
+   representing the described part of the aggregate for the purposes of total
+   scalarization already exist or TOTAL_FLD_FAILED if there is a problem which
+   prevents total scalarization from happening at all.  */
+
+static enum total_sra_field_state
+total_should_skip_creating_access (struct access *parent,
+				   struct access **last_seen_sibling,
+				   tree type, HOST_WIDE_INT pos,
+				   HOST_WIDE_INT size)
+{
+  struct access *next_child;
+  if (!*last_seen_sibling)
+    next_child = parent->first_child;
+  else
+    next_child = (*last_seen_sibling)->next_sibling;
+
+  /* First, traverse the chain of siblings until it points to an access with
+     offset at least equal to POS.  Check all skipped accesses whether they
+     span the POS boundary and if so, return with a failure.  */
+  while (next_child && next_child->offset < pos)
+    {
+      if (next_child->offset + next_child->size > pos)
+	return TOTAL_FLD_FAILED;
+      *last_seen_sibling = next_child;
+      next_child = next_child->next_sibling;
+    }
+
+  /* Now check whether next_child has exactly the right POS and SIZE and if so,
+     whether it can represent what we need and can be totally scalarized
+     itself.  */
+  if (next_child && next_child->offset == pos
+      && next_child->size == size)
+    {
+      if (!is_gimple_reg_type (next_child->type)
+	  && (!access_and_field_type_match_p (type, next_child->type)
+	      || !totally_scalarize_subtree (next_child)))
+	return TOTAL_FLD_FAILED;
+
+      *last_seen_sibling = next_child;
+      return TOTAL_FLD_DONE;
+    }
+
+  /* If the child we're looking at would partially overlap, we just cannot
+     totally scalarize.  */
+  if (next_child
+      && next_child->offset < pos + size
+      && next_child->offset + next_child->size > pos + size)
+    return TOTAL_FLD_FAILED;
+
+  if (is_gimple_reg_type (type))
+    {
+      /* We don't scalarize accesses that are children of other scalar type
+	 accesses, so if we go on and create an access for a register type,
+	 there should not be any pre-existing children.  There are rare cases
+	 where the requested type is a vector but we already have register
+	 accesses for all its elements which is equally good.  Detect that
+	 situation or whether we need to bail out.  */
+
+      HOST_WIDE_INT covered = pos;
+      bool skipping = false;
+      while (next_child
+	     && next_child->offset + next_child->size <= pos + size)
+	{
+	  if (next_child->offset != covered
+	      || !is_gimple_reg_type (next_child->type))
+	    return TOTAL_FLD_FAILED;
+
+	  covered += next_child->size;
+	  *last_seen_sibling = next_child;
+	  next_child = next_child->next_sibling;
+	  skipping = true;
+	}
+
+      if (skipping)
+	{
+	  if (covered != pos + size)
+	    return TOTAL_FLD_FAILED;
+	  else
+	    return TOTAL_FLD_DONE;
+	}
+    }
+
+  return TOTAL_FLD_CREATE;
+}
+
+/* Go over sub-tree rooted in ROOT and attempt to create scalar accesses
+   spanning all uncovered areas covered by ROOT, return false if the attempt
+   failed.  All created accesses will have grp_unscalarizable_region set (and
+   should be ignored if the function returns false).  */
+
+static bool
+totally_scalarize_subtree (struct access *root)
+{
+  gcc_checking_assert (!root->grp_unscalarizable_region);
+  gcc_checking_assert (!is_gimple_reg_type (root->type));
+
+  struct access *last_seen_sibling = NULL;
+
+  switch (TREE_CODE (root->type))
+    {
+    case RECORD_TYPE:
+      for (tree fld = TYPE_FIELDS (root->type); fld; fld = DECL_CHAIN (fld))
+	if (TREE_CODE (fld) == FIELD_DECL)
+	  {
+	    tree ft = TREE_TYPE (fld);
+	    HOST_WIDE_INT fsize = tree_to_uhwi (DECL_SIZE (fld));
+	    if (!fsize)
+	      continue;
+
+	    HOST_WIDE_INT pos = root->offset + int_bit_position (fld);
+	    enum total_sra_field_state
+	      state = total_should_skip_creating_access (root,
+							 &last_seen_sibling,
+							 ft, pos, fsize);
+	    switch (state)
+	      {
+	      case TOTAL_FLD_FAILED:
+		return false;
+	      case TOTAL_FLD_DONE:
+		continue;
+	      case TOTAL_FLD_CREATE:
+		break;
+	      default:
+		gcc_unreachable ();
+	      }
+
+	    struct access **p = (last_seen_sibling
+				 ? &last_seen_sibling->next_sibling
+				 : &root->first_child);
+	    tree nref = build3 (COMPONENT_REF, ft, root->expr, fld, NULL_TREE);
+	    struct access *new_child
+	      = create_total_access_and_reshape (root, pos, fsize, ft, nref, p);
+	    if (!new_child)
+	      return false;
+
+	    if (!is_gimple_reg_type (ft)
+		&& !totally_scalarize_subtree (new_child))
+	      return false;
+	    last_seen_sibling = new_child;
+	  }
+      break;
+    case ARRAY_TYPE:
+      {
+	tree elemtype = TREE_TYPE (root->type);
+	tree elem_size = TYPE_SIZE (elemtype);
+	gcc_assert (elem_size && tree_fits_shwi_p (elem_size));
+	HOST_WIDE_INT el_size = tree_to_shwi (elem_size);
+	gcc_assert (el_size > 0);
+
+	tree minidx = TYPE_MIN_VALUE (TYPE_DOMAIN (root->type));
+	gcc_assert (TREE_CODE (minidx) == INTEGER_CST);
+	tree maxidx = TYPE_MAX_VALUE (TYPE_DOMAIN (root->type));
+	/* Skip (some) zero-length arrays; others have MAXIDX == MINIDX - 1.  */
+	if (!maxidx)
+	  goto out;
+	gcc_assert (TREE_CODE (maxidx) == INTEGER_CST);
+	tree domain = TYPE_DOMAIN (root->type);
+	/* MINIDX and MAXIDX are inclusive, and must be interpreted in
+	   DOMAIN (e.g. signed int, whereas min/max may be size_int).  */
+	offset_int idx = wi::to_offset (minidx);
+	offset_int max = wi::to_offset (maxidx);
+	if (!TYPE_UNSIGNED (domain))
+	  {
+	    idx = wi::sext (idx, TYPE_PRECISION (domain));
+	    max = wi::sext (max, TYPE_PRECISION (domain));
+	  }
+	for (HOST_WIDE_INT pos = root->offset;
+	     idx <= max;
+	     pos += el_size, ++idx)
+	  {
+	    enum total_sra_field_state
+	      state = total_should_skip_creating_access (root,
+							 &last_seen_sibling,
+							 elemtype, pos,
+							 el_size);
+	    switch (state)
+	      {
+	      case TOTAL_FLD_FAILED:
+		return false;
+	      case TOTAL_FLD_DONE:
+		continue;
+	      case TOTAL_FLD_CREATE:
+		break;
+	      default:
+		gcc_unreachable ();
+	      }
+
+	    struct access **p = (last_seen_sibling
+				 ? &last_seen_sibling->next_sibling
+				 : &root->first_child);
+	    tree nref = build4 (ARRAY_REF, elemtype, root->expr,
+				wide_int_to_tree (domain, idx),
+				NULL_TREE, NULL_TREE);
+	    struct access *new_child
+	      = create_total_access_and_reshape (root, pos, el_size, elemtype,
+						 nref, p);
+	    if (!new_child)
+	      return false;
+
+	    if (!is_gimple_reg_type (elemtype)
+		&& !totally_scalarize_subtree (new_child))
+	      return false;
+	    last_seen_sibling = new_child;
+	  }
+      }
+      break;
+    default:
+      gcc_unreachable ();
+    }
+
+ out:
+  return true;
+}
+
 /* Go through all accesses collected throughout the (intraprocedural) analysis
    stage, exclude overlapping ones, identify representatives and build trees
    out of them, making decisions about scalarization on the way.  Return true
@@ -2867,8 +3132,22 @@  analyze_all_variable_accesses (void)
   bitmap tmp = BITMAP_ALLOC (NULL);
   bitmap_iterator bi;
   unsigned i;
-  bool optimize_speed_p = !optimize_function_for_size_p (cfun);
 
+  bitmap_copy (tmp, candidate_bitmap);
+  EXECUTE_IF_SET_IN_BITMAP (tmp, 0, i, bi)
+    {
+      tree var = candidate (i);
+      struct access *access;
+
+      access = sort_and_splice_var_accesses (var);
+      if (!access || !build_access_trees (access))
+	disqualify_candidate (var,
+			      "No or inhibitingly overlapping accesses.");
+    }
+
+  propagate_all_subaccesses ();
+
+  bool optimize_speed_p = !optimize_function_for_size_p (cfun);
   /* If the user didn't set PARAM_SRA_MAX_SCALARIZATION_SIZE_<...>,
      fall back to a target default.  */
   unsigned HOST_WIDE_INT max_scalarization_size
@@ -2884,7 +3163,6 @@  analyze_all_variable_accesses (void)
       if (global_options_set.x_param_sra_max_scalarization_size_size)
 	max_scalarization_size = param_sra_max_scalarization_size_size;
     }
-
   max_scalarization_size *= BITS_PER_UNIT;
 
   EXECUTE_IF_SET_IN_BITMAP (candidate_bitmap, 0, i, bi)
@@ -2892,46 +3170,56 @@  analyze_all_variable_accesses (void)
 	&& !bitmap_bit_p (cannot_scalarize_away_bitmap, i))
       {
 	tree var = candidate (i);
+	if (!VAR_P (var))
+	  continue;
 
-	if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var),
-						constant_decl_p (var)))
+	if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var))) > max_scalarization_size)
 	  {
-	    if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var)))
-		<= max_scalarization_size)
-	      {
-		create_total_scalarization_access (var);
-		completely_scalarize (var, TREE_TYPE (var), 0, var);
-		statistics_counter_event (cfun,
-					  "Totally-scalarized aggregates", 1);
-		if (dump_file && (dump_flags & TDF_DETAILS))
-		  {
-		    fprintf (dump_file, "Will attempt to totally scalarize ");
-		    print_generic_expr (dump_file, var);
-		    fprintf (dump_file, " (UID: %u): \n", DECL_UID (var));
-		  }
-	      }
-	    else if (dump_file && (dump_flags & TDF_DETAILS))
+	    if (dump_file && (dump_flags & TDF_DETAILS))
 	      {
 		fprintf (dump_file, "Too big to totally scalarize: ");
 		print_generic_expr (dump_file, var);
 		fprintf (dump_file, " (UID: %u)\n", DECL_UID (var));
 	      }
+	    continue;
 	  }
-      }
 
-  bitmap_copy (tmp, candidate_bitmap);
-  EXECUTE_IF_SET_IN_BITMAP (tmp, 0, i, bi)
-    {
-      tree var = candidate (i);
-      struct access *access;
+	bool all_types_ok = true;
+	for (struct access *access = get_first_repr_for_decl (var);
+	     access;
+	     access = access->next_grp)
+	  if (!can_totally_scalarize_forest_p (access)
+	      || !scalarizable_type_p (access->type, constant_decl_p (var)))
+	    {
+	      all_types_ok = false;
+	      break;
+	    }
+	if (!all_types_ok)
+	  continue;
 
-      access = sort_and_splice_var_accesses (var);
-      if (!access || !build_access_trees (access))
-	disqualify_candidate (var,
-			      "No or inhibitingly overlapping accesses.");
-    }
+	if (dump_file && (dump_flags & TDF_DETAILS))
+	  {
+	    fprintf (dump_file, "Will attempt to totally scalarize ");
+	    print_generic_expr (dump_file, var);
+	    fprintf (dump_file, " (UID: %u): \n", DECL_UID (var));
+	  }
+	bool scalarized = true;
+	for (struct access *access = get_first_repr_for_decl (var);
+	     access;
+	     access = access->next_grp)
+	  if (!is_gimple_reg_type (access->type)
+	      && !totally_scalarize_subtree (access))
+	    {
+	      scalarized = false;
+	      break;
+	    }
 
-  propagate_all_subaccesses ();
+	if (scalarized)
+	  for (struct access *access = get_first_repr_for_decl (var);
+	       access;
+	       access = access->next_grp)
+	    access->grp_total_scalarization = true;
+      }
 
   if (flag_checking)
     verify_all_sra_access_forests ();
@@ -3804,25 +4092,39 @@  initialize_constant_pool_replacements (void)
       tree var = candidate (i);
       if (!constant_decl_p (var))
 	continue;
-      vec<access_p> *access_vec = get_base_access_vector (var);
-      if (!access_vec)
-	continue;
-      for (unsigned i = 0; i < access_vec->length (); i++)
+
+      struct access *access = get_first_repr_for_decl (var);
+
+      while (access)
 	{
-	  struct access *access = (*access_vec)[i];
-	  if (!access->replacement_decl)
-	    continue;
-	  gassign *stmt
-	    = gimple_build_assign (get_access_replacement (access),
-				   unshare_expr (access->expr));
-	  if (dump_file && (dump_flags & TDF_DETAILS))
+	  if (access->replacement_decl)
 	    {
-	      fprintf (dump_file, "Generating constant initializer: ");
-	      print_gimple_stmt (dump_file, stmt, 0);
-	      fprintf (dump_file, "\n");
+	      gassign *stmt
+		= gimple_build_assign (get_access_replacement (access),
+				       unshare_expr (access->expr));
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		{
+		  fprintf (dump_file, "Generating constant initializer: ");
+		  print_gimple_stmt (dump_file, stmt, 0);
+		  fprintf (dump_file, "\n");
+		}
+	      gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
+	      update_stmt (stmt);
+	    }
+
+	  if (access->first_child)
+	    access = access->first_child;
+	  else if (access->next_sibling)
+	    access = access->next_sibling;
+	  else
+	    {
+	      while (access->parent && !access->next_sibling)
+		access = access->parent;
+	      if (access->next_sibling)
+		access = access->next_sibling;
+	      else
+		access = access->next_grp;
 	    }
-	  gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
-	  update_stmt (stmt);
 	}
     }