diff mbox

[RFA,PR,tree-optimization/33562,1/4] Byte tracking in DSE - v3

Message ID 23d481d6-36bd-c793-8d84-48c7e0c2e478@redhat.com
State New
Headers show

Commit Message

Jeff Law Dec. 22, 2016, 6:26 a.m. UTC
This is the first of the 4 part patchkit to address deficiencies in our 
DSE implementation.

This patch addresses the P2 regression 33562 which has been a low 
priority regression since gcc-4.3.  To summarize, DSE no longer has the 
ability to detect an aggregate store as dead if subsequent stores are 
done in a piecemeal fashion.

I originally tackled this by changing how we lower complex objects. That 
was sufficient to address 33562, but was reasonably rejected.

This version attacks the problem by improving DSE to track stores to 
memory at a byte level.  That allows us to determine if a series of 
stores completely covers an earlier store (thus making the earlier store 
dead).

A useful side effect of this is we can detect when parts of a store are 
dead and potentially rewrite the store.  This patch implements that for 
complex object initializations.  While not strictly part of 33562, it's 
so closely related that I felt it belongs as part of this patch.

This originally limited the size of the tracked memory space to 64 
bytes.  I bumped the limit after working through the CONSTRUCTOR and 
mem* trimming patches.  The 256 byte limit is still fairly arbitrary and 
I wouldn't lose sleep if we throttled back to 64 or 128 bytes.

Later patches in the kit will build upon this patch.  So if pieces look 
like skeleton code, that's because it is.

The changes since the V2 patch are:

1. Using sbitmaps rather than bitmaps.
2. Returning a tri-state from dse_classify_store (renamed from 
dse_possible_dead_store_p)
3. More efficient trim computation
4. Moving trimming code out of dse_classify_store
5. Refactoring code to delete dead calls/assignments
6. dse_optimize_stmt moves into the dse_dom_walker class

Not surprisingly, this patch has most of the changes based on prior 
feedback as it includes the raw infrastructure.

Bootstrapped and regression tested on x86_64-linux-gnu.  OK for the trunk?
PR tree-optimization/33562
	* params.def (PARM_DSE_MAX_OBJECT_SIZE): New PARAM.
	* sbitmap.h (bitmap_clear_range, bitmap_set_range): Prototype new
	functions.
	(bitmap_count_bits): Likewise.
	* sbitmap.c (bitmap_clear_range, bitmap_set_range): New functions.
	(bitmap_count_bits): Likewise.
	* tree-ssa-dse.c: Include params.h.
	(dse_store_status): New enum.
	(initialize_ao_ref_for_dse): New, partially extracted from
	dse_optimize_stmt.
	(valid_ao_ref_for_dse, normalize_ref): New.
	(setup_live_bytes_from_ref, compute_trims): Likewise.
	(clear_bytes_written_by, trim_complex_store): Likewise.
	(maybe_trim_partially_dead_store): Likewise.
	(maybe_trim_complex_store): Likewise.
	(dse_classify_store): Renamed from dse_possibly_dead_store_p.
	Track what bytes live from the original store.  Return tri-state
	for dead, partially dead or live.
	(dse_dom_walker): Add constructor, destructor and new private members.
	(delete_dead_call, delete_dead_assignment): New extracted from
	dse_optimize_stmt.
	(dse_optimize_stmt): Make a member of dse_dom_walker.
	Use initialize_ao_ref_for_dse.


	* gcc.dg/tree-ssa/complex-4.c: No longer xfailed.
	* gcc.dg/tree-ssa/complex-5.c: Likewise.
	* gcc.dg/tree-ssa/ssa-dse-9.c: Likewise.
	* gcc.dg/tree-ssa/ssa-dse-18.c: New test.
	* gcc.dg/tree-ssa/ssa-dse-19.c: Likewise.
	* gcc.dg/tree-ssa/ssa-dse-20.c: Likewise.
	* gcc.dg/tree-ssa/ssa-dse-21.c: Likewise.

Comments

Richard Biener Jan. 4, 2017, 1:22 p.m. UTC | #1
On Thu, Dec 22, 2016 at 7:26 AM, Jeff Law <law@redhat.com> wrote:
> This is the first of the 4 part patchkit to address deficiencies in our DSE
> implementation.
>
> This patch addresses the P2 regression 33562 which has been a low priority
> regression since gcc-4.3.  To summarize, DSE no longer has the ability to
> detect an aggregate store as dead if subsequent stores are done in a
> piecemeal fashion.
>
> I originally tackled this by changing how we lower complex objects. That was
> sufficient to address 33562, but was reasonably rejected.
>
> This version attacks the problem by improving DSE to track stores to memory
> at a byte level.  That allows us to determine if a series of stores
> completely covers an earlier store (thus making the earlier store dead).
>
> A useful side effect of this is we can detect when parts of a store are dead
> and potentially rewrite the store.  This patch implements that for complex
> object initializations.  While not strictly part of 33562, it's so closely
> related that I felt it belongs as part of this patch.
>
> This originally limited the size of the tracked memory space to 64 bytes.  I
> bumped the limit after working through the CONSTRUCTOR and mem* trimming
> patches.  The 256 byte limit is still fairly arbitrary and I wouldn't lose
> sleep if we throttled back to 64 or 128 bytes.
>
> Later patches in the kit will build upon this patch.  So if pieces look like
> skeleton code, that's because it is.
>
> The changes since the V2 patch are:
>
> 1. Using sbitmaps rather than bitmaps.
> 2. Returning a tri-state from dse_classify_store (renamed from
> dse_possible_dead_store_p)
> 3. More efficient trim computation
> 4. Moving trimming code out of dse_classify_store
> 5. Refactoring code to delete dead calls/assignments
> 6. dse_optimize_stmt moves into the dse_dom_walker class
>
> Not surprisingly, this patch has most of the changes based on prior feedback
> as it includes the raw infrastructure.
>
> Bootstrapped and regression tested on x86_64-linux-gnu.  OK for the trunk?

New functions in sbitmap.c lack function comments.

bitmap_count_bits fails to guard against GCC_VERSION >= 3400 (the version
is not important, but non-GCC host compilers are).  See bitmap.c for a
fallback.

Both bitmap_clear_range and bitmap_set_range look rather inefficient...
(it's not likely anybody will clean this up after you)

I'd say split out the sbitmap.[ch] changes.

+DEFPARAM(PARAM_DSE_MAX_OBJECT_SIZE,
+        "dse-max-object-size",
+        "Maximum size (in bytes) of objects tracked by dead store
elimination.",
+        256, 0, 0)

the docs suggest that DSE doesn't handle larger stores but it does (just in
the original limited way).  Maybe "tracked bytewise" is better.

+static bool
+valid_ao_ref_for_dse (ao_ref *ref)
+{
+  return (ao_ref_base (ref)
+         && ref->max_size != -1
+         && (ref->offset % BITS_PER_UNIT) == 0
+         && (ref->size % BITS_PER_UNIT) == 0
+         && (ref->size / BITS_PER_UNIT) > 0);

I think the last test is better written as ref->size != -1.

Btw, seeing you discount non-byte size/offset stores this somehow asks
for store-merging being done before the last DSE (it currently runs after).
Sth to keep in mind for GCC 8.

+/* Delete a dead store STMT, which is mem* call of some kind.  */

call STMT

+static void
+delete_dead_call (gimple *stmt)
+{
+
excess vertical space
......
+  if (lhs)
+    {
+      tree ptr = gimple_call_arg (stmt, 0);
+      gimple *new_stmt = gimple_build_assign (lhs, ptr);
+      unlink_stmt_vdef (stmt);
+      if (gsi_replace (&gsi, new_stmt, true))
+        bitmap_set_bit (need_eh_cleanup, gimple_bb (stmt)->index);

  release_ssa_name (gimple_vdef (stmt));


+  { m_live_bytes = sbitmap_alloc (PARAM_VALUE
(PARAM_DSE_MAX_OBJECT_SIZE));m_byte_tracking_enabled = false; }

formatting.

The DSE parts look good to me with the nits above fixed.  Just re-spin
the sbitmap.[ch] part please.

Thanks,
Richard.

>
>         PR tree-optimization/33562
>         * params.def (PARM_DSE_MAX_OBJECT_SIZE): New PARAM.
>         * sbitmap.h (bitmap_clear_range, bitmap_set_range): Prototype new
>         functions.
>         (bitmap_count_bits): Likewise.
>         * sbitmap.c (bitmap_clear_range, bitmap_set_range): New functions.
>         (bitmap_count_bits): Likewise.
>         * tree-ssa-dse.c: Include params.h.
>         (dse_store_status): New enum.
>         (initialize_ao_ref_for_dse): New, partially extracted from
>         dse_optimize_stmt.
>         (valid_ao_ref_for_dse, normalize_ref): New.
>         (setup_live_bytes_from_ref, compute_trims): Likewise.
>         (clear_bytes_written_by, trim_complex_store): Likewise.
>         (maybe_trim_partially_dead_store): Likewise.
>         (maybe_trim_complex_store): Likewise.
>         (dse_classify_store): Renamed from dse_possibly_dead_store_p.
>         Track what bytes live from the original store.  Return tri-state
>         for dead, partially dead or live.
>         (dse_dom_walker): Add constructor, destructor and new private
> members.
>         (delete_dead_call, delete_dead_assignment): New extracted from
>         dse_optimize_stmt.
>         (dse_optimize_stmt): Make a member of dse_dom_walker.
>         Use initialize_ao_ref_for_dse.
>
>
>         * gcc.dg/tree-ssa/complex-4.c: No longer xfailed.
>         * gcc.dg/tree-ssa/complex-5.c: Likewise.
>         * gcc.dg/tree-ssa/ssa-dse-9.c: Likewise.
>         * gcc.dg/tree-ssa/ssa-dse-18.c: New test.
>         * gcc.dg/tree-ssa/ssa-dse-19.c: Likewise.
>         * gcc.dg/tree-ssa/ssa-dse-20.c: Likewise.
>         * gcc.dg/tree-ssa/ssa-dse-21.c: Likewise.
>
> diff --git a/gcc/params.def b/gcc/params.def
> index 50f75a7..f367c1d 100644
> --- a/gcc/params.def
> +++ b/gcc/params.def
> @@ -532,6 +532,11 @@ DEFPARAM(PARAM_AVG_LOOP_NITER,
>          "Average number of iterations of a loop.",
>          10, 1, 0)
>
> +DEFPARAM(PARAM_DSE_MAX_OBJECT_SIZE,
> +        "dse-max-object-size",
> +        "Maximum size (in bytes) of objects tracked by dead store
> elimination.",
> +        256, 0, 0)
> +
>  DEFPARAM(PARAM_SCEV_MAX_EXPR_SIZE,
>          "scev-max-expr-size",
>          "Bound on size of expressions used in the scalar evolutions
> analyzer.",
> diff --git a/gcc/sbitmap.c b/gcc/sbitmap.c
> index 10b4347..2b66a6c 100644
> --- a/gcc/sbitmap.c
> +++ b/gcc/sbitmap.c
> @@ -202,6 +202,39 @@ bitmap_empty_p (const_sbitmap bmap)
>    return true;
>  }
>
> +void
> +bitmap_clear_range (sbitmap bmap, unsigned int start, unsigned int count)
> +{
> +  for (unsigned int i = start; i < start + count; i++)
> +    bitmap_clear_bit (bmap, i);
> +}
> +
> +void
> +bitmap_set_range (sbitmap bmap, unsigned int start, unsigned int count)
> +{
> +  for (unsigned int i = start; i < start + count; i++)
> +    bitmap_set_bit (bmap, i);
> +}
> +
> +
> +unsigned int
> +bitmap_count_bits (const_sbitmap bmap)
> +{
> +  unsigned int count = 0;
> +  for (unsigned int i = 0; i < bmap->size; i++)
> +    if (bmap->elms[i])
> +      {
> +# if HOST_BITS_PER_WIDEST_FAST_INT == HOST_BITS_PER_LONG
> +       count += __builtin_popcountl (bmap->elms[i]);
> +# elif HOST_BITS_PER_WIDEST_FAST_INT == HOST_BITS_PER_LONGLONG
> +       count += __builtin_popcountll (bmap->elms[i]);
> +# else
> +       count += __builtin_popcount (bmap->elms[i]);
> +# endif
> +      }
> +  return count;
> +}
> +
>
>  /* Zero all elements in a bitmap.  */
>
> diff --git a/gcc/sbitmap.h b/gcc/sbitmap.h
> index bd734f9..aa43d0d 100644
> --- a/gcc/sbitmap.h
> +++ b/gcc/sbitmap.h
> @@ -231,8 +231,11 @@ extern sbitmap *sbitmap_vector_alloc (unsigned int,
> unsigned int);
>  extern sbitmap sbitmap_resize (sbitmap, unsigned int, int);
>  extern void bitmap_copy (sbitmap, const_sbitmap);
>  extern int bitmap_equal_p (const_sbitmap, const_sbitmap);
> +extern unsigned int bitmap_count_bits (const_sbitmap);
>  extern bool bitmap_empty_p (const_sbitmap);
>  extern void bitmap_clear (sbitmap);
> +extern void bitmap_clear_range (sbitmap, unsigned, unsigned);
> +extern void bitmap_set_range (sbitmap, unsigned, unsigned);
>  extern void bitmap_ones (sbitmap);
>  extern void bitmap_vector_clear (sbitmap *, unsigned int);
>  extern void bitmap_vector_ones (sbitmap *, unsigned int);
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/complex-4.c
> b/gcc/testsuite/gcc.dg/tree-ssa/complex-4.c
> index 87a2638..3155741 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/complex-4.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/complex-4.c
> @@ -10,4 +10,4 @@ int f(void)
>    return g(&t);
>  }
>
> -/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" { xfail
> *-*-* } } } */
> +/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/complex-5.c
> b/gcc/testsuite/gcc.dg/tree-ssa/complex-5.c
> index e2cd403..e6d027f 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/complex-5.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/complex-5.c
> @@ -8,4 +8,4 @@ int f(void)
>   __imag__ t = 2;
>  }
>
> -/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" { xfail
> *-*-* } } } */
> +/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-18.c
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-18.c
> new file mode 100644
> index 0000000..92b2df8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-18.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +int g(_Complex int*);
> +int f(void)
> +{
> +  _Complex int t = 0;
> +  int i, j;
> + __imag__ t += 2;
> +  return g(&t);
> +}
> +
> +
> +/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "REALPART_EXPR" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "IMAGPART_EXPR" 1 "optimized" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-19.c
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-19.c
> new file mode 100644
> index 0000000..718b746
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-19.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +int g(_Complex int*);
> +int f(void)
> +{
> +  _Complex int t = 0;
> +  int i, j;
> + __real__ t += 2;
> +  return g(&t);
> +}
> +
> +
> +/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "REALPART_EXPR" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "IMAGPART_EXPR" 1 "optimized" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-20.c
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-20.c
> new file mode 100644
> index 0000000..4e14d9b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-20.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O1 -fno-tree-dce -fdump-tree-optimized" } */
> +_Complex int t = 0;
> +int f(void)
> +{
> +  t = 0;
> + __imag__ t = 2;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "REALPART_EXPR" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "IMAGPART_EXPR" 1 "optimized" } } */
> +
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-21.c
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-21.c
> new file mode 100644
> index 0000000..d1e0b85
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-21.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O1 -fno-tree-dce -fdump-tree-optimized" } */
> +_Complex int t = 0;
> +int f(void)
> +{
> +  t = 0;
> + __real__ t = 2;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "REALPART_EXPR" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "IMAGPART_EXPR" 1 "optimized" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-9.c
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-9.c
> index 594c20c..ae48ddd 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-9.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-9.c
> @@ -11,4 +11,4 @@ foo ()
>  }
>
>  /* We should eliminate the first assignment.  */
> -/* { dg-final { scan-tree-dump-times "VDEF" 2 "dse1" { xfail *-*-* } } } */
> +/* { dg-final { scan-tree-dump-times "VDEF" 2 "dse1" } } */
> diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c
> index 778b363..40acd83 100644
> --- a/gcc/tree-ssa-dse.c
> +++ b/gcc/tree-ssa-dse.c
> @@ -33,6 +33,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-dfa.h"
>  #include "domwalk.h"
>  #include "tree-cfgcleanup.h"
> +#include "params.h"
>
>  /* This file implements dead store elimination.
>
> @@ -68,14 +69,242 @@ along with GCC; see the file COPYING3.  If not see
>     remove their dead edges eventually.  */
>  static bitmap need_eh_cleanup;
>
> +/* Return value from dse_classify_store */
> +enum dse_store_status
> +{
> +  DSE_STORE_LIVE,
> +  DSE_STORE_MAYBE_PARTIAL_DEAD,
> +  DSE_STORE_DEAD
> +};
> +
> +/* STMT is a statement that may write into memory.  Analyze it and
> +   initialize WRITE to describe how STMT affects memory.
> +
> +   Return TRUE if the the statement was analyzed, FALSE otherwise.
> +
> +   It is always safe to return FALSE.  But typically better optimziation
> +   can be achieved by analyzing more statements.  */
> +
> +static bool
> +initialize_ao_ref_for_dse (gimple *stmt, ao_ref *write)
> +{
> +  /* It's advantageous to handle certain mem* functions.  */
> +  if (gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
> +    {
> +      switch (DECL_FUNCTION_CODE (gimple_call_fndecl (stmt)))
> +       {
> +         case BUILT_IN_MEMCPY:
> +         case BUILT_IN_MEMMOVE:
> +         case BUILT_IN_MEMSET:
> +           {
> +             tree size = NULL_TREE;
> +             if (gimple_call_num_args (stmt) == 3)
> +               size = gimple_call_arg (stmt, 2);
> +             tree ptr = gimple_call_arg (stmt, 0);
> +             ao_ref_init_from_ptr_and_size (write, ptr, size);
> +             return true;
> +           }
> +         default:
> +           break;
> +       }
> +    }
> +  else if (is_gimple_assign (stmt))
> +    {
> +      ao_ref_init (write, gimple_assign_lhs (stmt));
> +      return true;
> +    }
> +  return false;
> +}
> +
> +/* Given REF from the the alias oracle, return TRUE if it is a valid
> +   memory reference for dead store elimination, false otherwise.
> +
> +   In particular, the reference must have a known base, known maximum
> +   size, start at a byte offset and have a size that is one or more
> +   bytes.   Finally the offset and size must be smaller than
> +   PARAM_DSE_MAX_OBJECT_SIZE.  */
> +
> +static bool
> +valid_ao_ref_for_dse (ao_ref *ref)
> +{
> +  return (ao_ref_base (ref)
> +         && ref->max_size != -1
> +         && (ref->offset % BITS_PER_UNIT) == 0
> +         && (ref->size % BITS_PER_UNIT) == 0
> +         && (ref->size / BITS_PER_UNIT) > 0);
> +}
> +
> +/* Normalize COPY (an ao_ref) relative to REF.  Essentially when we are
> +   done COPY will only refer bytes found within REF.
> +
> +   We have already verified that COPY intersects at least one
> +   byte with REF.  */
> +
> +static void
> +normalize_ref (ao_ref *copy, ao_ref *ref)
> +{
> +  /* If COPY starts before REF, then reset the beginning of
> +     COPY to match REF and decrease the size of COPY by the
> +     number of bytes removed from COPY.  */
> +  if (copy->offset < ref->offset)
> +    {
> +      copy->size -= (ref->offset - copy->offset);
> +      copy->offset = ref->offset;
> +    }
> +
> +  /* If COPY extends beyond REF, chop off its size appropriately.  */
> +  if (copy->offset + copy->size > ref->offset + ref->size)
> +    copy->size -= (copy->offset + copy->size - (ref->offset + ref->size));
> +}
> +
> +/* Clear any bytes written by STMT from the bitmap LIVE_BYTES.  The base
> +   address written by STMT must match the one found in REF, which must
> +   have its base address previously initialized.
> +
> +   This routine must be conservative.  If we don't know the offset or
> +   actual size written, assume nothing was written.  */
> +
> +static void
> +clear_bytes_written_by (sbitmap live_bytes, gimple *stmt, ao_ref *ref)
> +{
> +  ao_ref write;
> +  if (!initialize_ao_ref_for_dse (stmt, &write))
> +    return;
> +
> +  /* Verify we have the same base memory address, the write
> +     has a known size and overlaps with REF.  */
> +  if (valid_ao_ref_for_dse (&write)
> +      && write.base == ref->base
> +      && write.size == write.max_size
> +      && ((write.offset < ref->offset
> +          && write.offset + write.size > ref->offset)
> +         || (write.offset >= ref->offset
> +             && write.offset < ref->offset + ref->size)))
> +    {
> +      normalize_ref (&write, ref);
> +      bitmap_clear_range (live_bytes,
> +                         (write.offset - ref->offset) / BITS_PER_UNIT,
> +                         write.size / BITS_PER_UNIT);
> +    }
> +}
> +
> +/* REF is a memory write.  Extract relevant information from it and
> +   initialize the LIVE_BYTES bitmap.  If successful, return TRUE.
> +   Otherwise return FALSE.  */
> +
> +static bool
> +setup_live_bytes_from_ref (ao_ref *ref, sbitmap live_bytes)
> +{
> +  if (valid_ao_ref_for_dse (ref)
> +      && (ref->size / BITS_PER_UNIT
> +         <= PARAM_VALUE (PARAM_DSE_MAX_OBJECT_SIZE)))
> +    {
> +      bitmap_clear (live_bytes);
> +      bitmap_set_range (live_bytes, 0, ref->size / BITS_PER_UNIT);
> +      return true;
> +    }
> +  return false;
> +}
> +
> +/* Compute the number of elements that we can trim from the head and
> +   tail of ORIG resulting in a bitmap that is a superset of LIVE.
> +
> +   Store the number of elements trimmed from the head and tail in
> +   TRIM_HEAD and TRIM_TAIL.  */
> +
> +static void
> +compute_trims (ao_ref *ref, sbitmap live, int *trim_head, int *trim_tail)
> +{
> +  /* We use sbitmaps biased such that ref->offset is bit zero and the
> bitmap
> +     extends through ref->size.  So we know that in the original bitmap
> +     bits 0..ref->size were true.  We don't actually need the bitmap, just
> +     the REF to compute the trims.  */
> +
> +  /* Now identify how much, if any of the tail we can chop off.  */
> +  *trim_tail = 0;
> +  int last_orig = (ref->size / BITS_PER_UNIT) - 1;
> +  int last_live = bitmap_last_set_bit (live);
> +  *trim_tail = (last_orig - last_live) & ~0x1;
> +
> +  /* Identify how much, if any of the head we can chop off.  */
> +  int first_orig = 0;
> +  int first_live = bitmap_first_set_bit (live);
> +  *trim_head = (first_live - first_orig) & ~0x1;
> +}
> +
> +/* STMT initializes an object from COMPLEX_CST where one or more of the
> +   bytes written may be dead stores.  REF is a representation of the
> +   memory written.  LIVE is the bitmap of stores that are actually live.
> +
> +   Attempt to rewrite STMT so that only the real or imaginary part of
> +   the object is actually stored.  */
> +
> +static void
> +maybe_trim_complex_store (ao_ref *ref, sbitmap live, gimple *stmt)
> +{
> +  int trim_head, trim_tail;
> +  compute_trims (ref, live, &trim_head, &trim_tail);
> +
> +  /* The amount of data trimmed from the head or tail must be at
> +     least half the size of the object to ensure we're trimming
> +     the entire real or imaginary half.  By writing things this
> +     way we avoid more O(n) bitmap operations.  */
> +  if (trim_tail * 2 >= ref->size / BITS_PER_UNIT)
> +    {
> +      /* TREE_REALPART is live */
> +      tree x = TREE_REALPART (gimple_assign_rhs1 (stmt));
> +      tree y = gimple_assign_lhs (stmt);
> +      y = build1 (REALPART_EXPR, TREE_TYPE (x), y);
> +      gimple_assign_set_lhs (stmt, y);
> +      gimple_assign_set_rhs1 (stmt, x);
> +    }
> +  else if (trim_head * 2 >= ref->size / BITS_PER_UNIT)
> +    {
> +      /* TREE_IMAGPART is live */
> +      tree x = TREE_IMAGPART (gimple_assign_rhs1 (stmt));
> +      tree y = gimple_assign_lhs (stmt);
> +      y = build1 (IMAGPART_EXPR, TREE_TYPE (x), y);
> +      gimple_assign_set_lhs (stmt, y);
> +      gimple_assign_set_rhs1 (stmt, x);
> +    }
> +
> +  /* Other cases indicate parts of both the real and imag subobjects
> +     are live.  We do not try to optimize those cases.  */
> +}
> +
> +/* STMT is a memory write where one or more bytes written are dead
> +   stores.  ORIG is the bitmap of bytes stored by STMT.  LIVE is the
> +   bitmap of stores that are actually live.
> +
> +   Attempt to rewrite STMT so that it writes fewer memory locations.  Right
> +   now we only support trimming at the start or end of the memory region.
> +   It's not clear how much there is to be gained by trimming from the
> middle
> +   of the region.  */
> +
> +static void
> +maybe_trim_partially_dead_store (ao_ref *ref, sbitmap live, gimple *stmt)
> +{
> +  if (is_gimple_assign (stmt))
> +    {
> +      switch (gimple_assign_rhs_code (stmt))
> +       {
> +       case COMPLEX_CST:
> +         maybe_trim_complex_store (ref, live, stmt);
> +         break;
> +       default:
> +         break;
> +       }
> +    }
> +}
>
>  /* A helper of dse_optimize_stmt.
>     Given a GIMPLE_ASSIGN in STMT that writes to REF, find a candidate
>     statement *USE_STMT that may prove STMT to be dead.
>     Return TRUE if the above conditions are met, otherwise FALSE.  */
>
> -static bool
> -dse_possible_dead_store_p (ao_ref *ref, gimple *stmt, gimple **use_stmt)
> +static dse_store_status
> +dse_classify_store (ao_ref *ref, gimple *stmt, gimple **use_stmt,
> +                   bool byte_tracking_enabled, sbitmap live_bytes)
>  {
>    gimple *temp;
>    unsigned cnt = 0;
> @@ -97,7 +326,7 @@ dse_possible_dead_store_p (ao_ref *ref, gimple *stmt,
> gimple **use_stmt)
>        /* Limit stmt walking to be linear in the number of possibly
>           dead stores.  */
>        if (++cnt > 256)
> -       return false;
> +       return DSE_STORE_LIVE;
>
>        if (gimple_code (temp) == GIMPLE_PHI)
>         defvar = PHI_RESULT (temp);
> @@ -135,7 +364,7 @@ dse_possible_dead_store_p (ao_ref *ref, gimple *stmt,
> gimple **use_stmt)
>                   fail = true;
>                   BREAK_FROM_IMM_USE_STMT (ui);
>                 }
> -             /* Do not consider the PHI as use if it dominates the
> +             /* Do not consider the PHI as use if it dominates the
>                  stmt defining the virtual operand we are processing,
>                  we have processed it already in this case.  */
>               if (gimple_bb (defvar_def) != gimple_bb (use_stmt)
> @@ -164,7 +393,13 @@ dse_possible_dead_store_p (ao_ref *ref, gimple *stmt,
> gimple **use_stmt)
>         }
>
>        if (fail)
> -       return false;
> +       {
> +         /* STMT might be partially dead and we may be able to reduce
> +            how many memory locations it stores into.  */
> +         if (byte_tracking_enabled && !gimple_clobber_p (stmt))
> +           return DSE_STORE_MAYBE_PARTIAL_DEAD;
> +         return DSE_STORE_LIVE;
> +       }
>
>        /* If we didn't find any definition this means the store is dead
>           if it isn't a store to global reachable memory.  In this case
> @@ -172,20 +407,100 @@ dse_possible_dead_store_p (ao_ref *ref, gimple *stmt,
> gimple **use_stmt)
>        if (!temp)
>         {
>           if (ref_may_alias_global_p (ref))
> -           return false;
> +           return DSE_STORE_LIVE;
>
>           temp = stmt;
>           break;
>         }
> +
> +      if (byte_tracking_enabled && temp)
> +       clear_bytes_written_by (live_bytes, temp, ref);
>      }
> -  /* Continue walking until we reach a kill.  */
> -  while (!stmt_kills_ref_p (temp, ref));
> +  /* Continue walking until we reach a full kill as a single statement
> +     or there are no more live bytes.  */
> +  while (!stmt_kills_ref_p (temp, ref)
> +        && !(byte_tracking_enabled && bitmap_empty_p (live_bytes)));
>
>    *use_stmt = temp;
> +  return DSE_STORE_DEAD;
> +}
> +
> +
> +class dse_dom_walker : public dom_walker
> +{
> +public:
> +  dse_dom_walker (cdi_direction direction) : dom_walker (direction)
> +  { m_live_bytes = sbitmap_alloc (PARAM_VALUE
> (PARAM_DSE_MAX_OBJECT_SIZE));m_byte_tracking_enabled = false; }
> +
> +  ~dse_dom_walker () { sbitmap_free (m_live_bytes); }
> +
> +  virtual edge before_dom_children (basic_block);
>
> -  return true;
> +private:
> +  sbitmap m_live_bytes;
> +  bool m_byte_tracking_enabled;
> +  void dse_optimize_stmt (gimple_stmt_iterator *);
> +};
> +
> +/* Delete a dead store STMT, which is mem* call of some kind.  */
> +static void
> +delete_dead_call (gimple *stmt)
> +{
> +
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +    {
> +      fprintf (dump_file, "  Deleted dead call: ");
> +      print_gimple_stmt (dump_file, stmt, dump_flags, 0);
> +      fprintf (dump_file, "\n");
> +    }
> +
> +  tree lhs = gimple_call_lhs (stmt);
> +  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
> +  if (lhs)
> +    {
> +      tree ptr = gimple_call_arg (stmt, 0);
> +      gimple *new_stmt = gimple_build_assign (lhs, ptr);
> +      unlink_stmt_vdef (stmt);
> +      if (gsi_replace (&gsi, new_stmt, true))
> +        bitmap_set_bit (need_eh_cleanup, gimple_bb (stmt)->index);
> +    }
> +  else
> +    {
> +      /* Then we need to fix the operand of the consuming stmt.  */
> +      unlink_stmt_vdef (stmt);
> +
> +      /* Remove the dead store.  */
> +      if (gsi_remove (&gsi, true))
> +       bitmap_set_bit (need_eh_cleanup, gimple_bb (stmt)->index);
> +      release_defs (stmt);
> +    }
>  }
>
> +/* Delete a dead store STMT, which is a gimple assignment. */
> +
> +static void
> +delete_dead_assignment (gimple *stmt)
> +{
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +    {
> +      fprintf (dump_file, "  Deleted dead store: ");
> +      print_gimple_stmt (dump_file, stmt, dump_flags, 0);
> +      fprintf (dump_file, "\n");
> +    }
> +
> +  /* Then we need to fix the operand of the consuming stmt.  */
> +  unlink_stmt_vdef (stmt);
> +
> +  /* Remove the dead store.  */
> +  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
> +  basic_block bb = gimple_bb (stmt);
> +  if (gsi_remove (&gsi, true))
> +    bitmap_set_bit (need_eh_cleanup, bb->index);
> +
> +  /* And release any SSA_NAMEs set in this statement back to the
> +     SSA_NAME manager.  */
> +  release_defs (stmt);
> +}
>
>  /* Attempt to eliminate dead stores in the statement referenced by BSI.
>
> @@ -198,8 +513,8 @@ dse_possible_dead_store_p (ao_ref *ref, gimple *stmt,
> gimple **use_stmt)
>     is used precisely once by a later store to the same location which
>     post dominates the first store, then the first store is dead.  */
>
> -static void
> -dse_optimize_stmt (gimple_stmt_iterator *gsi)
> +void
> +dse_dom_walker::dse_optimize_stmt (gimple_stmt_iterator *gsi)
>  {
>    gimple *stmt = gsi_stmt (*gsi);
>
> @@ -214,6 +529,10 @@ dse_optimize_stmt (gimple_stmt_iterator *gsi)
>           || TREE_CODE (gimple_assign_lhs (stmt)) != MEM_REF))
>      return;
>
> +  ao_ref ref;
> +  if (!initialize_ao_ref_for_dse (stmt, &ref))
> +    return;
> +
>    /* We know we have virtual definitions.  We can handle assignments and
>       some builtin calls.  */
>    if (gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
> @@ -225,42 +544,26 @@ dse_optimize_stmt (gimple_stmt_iterator *gsi)
>           case BUILT_IN_MEMSET:
>             {
>               gimple *use_stmt;
> -             ao_ref ref;
> -             tree size = NULL_TREE;
> -             if (gimple_call_num_args (stmt) == 3)
> -               size = gimple_call_arg (stmt, 2);
> -             tree ptr = gimple_call_arg (stmt, 0);
> -             ao_ref_init_from_ptr_and_size (&ref, ptr, size);
> -             if (!dse_possible_dead_store_p (&ref, stmt, &use_stmt))
> +             enum dse_store_status store_status;
> +             m_byte_tracking_enabled
> +               = setup_live_bytes_from_ref (&ref, m_live_bytes);
> +             store_status = dse_classify_store (&ref, stmt, &use_stmt,
> +                                                m_byte_tracking_enabled,
> +                                                m_live_bytes);
> +             if (store_status == DSE_STORE_LIVE)
>                 return;
>
> -             if (dump_file && (dump_flags & TDF_DETAILS))
> -               {
> -                 fprintf (dump_file, "  Deleted dead call: ");
> -                 print_gimple_stmt (dump_file, gsi_stmt (*gsi), dump_flags,
> 0);
> -                 fprintf (dump_file, "\n");
> -               }
> -
> -             tree lhs = gimple_call_lhs (stmt);
> -             if (lhs)
> +             if (store_status == DSE_STORE_MAYBE_PARTIAL_DEAD)
>                 {
> -                 gimple *new_stmt = gimple_build_assign (lhs, ptr);
> -                 unlink_stmt_vdef (stmt);
> -                 if (gsi_replace (gsi, new_stmt, true))
> -                   bitmap_set_bit (need_eh_cleanup, gimple_bb
> (stmt)->index);
> +                 maybe_trim_partially_dead_store (&ref, m_live_bytes,
> stmt);
> +                 return;
>                 }
> -             else
> -               {
> -                 /* Then we need to fix the operand of the consuming stmt.
> */
> -                 unlink_stmt_vdef (stmt);
>
> -                 /* Remove the dead store.  */
> -                 if (gsi_remove (gsi, true))
> -                   bitmap_set_bit (need_eh_cleanup, gimple_bb
> (stmt)->index);
> -                 release_defs (stmt);
> -               }
> -             break;
> +             if (store_status == DSE_STORE_DEAD)
> +               delete_dead_call (stmt);
> +             return;
>             }
> +
>           default:
>             return;
>         }
> @@ -276,10 +579,20 @@ dse_optimize_stmt (gimple_stmt_iterator *gsi)
>         use_stmt = stmt;
>        else
>         {
> -         ao_ref ref;
> -         ao_ref_init (&ref, gimple_assign_lhs (stmt));
> -         if (!dse_possible_dead_store_p (&ref, stmt, &use_stmt))
> +         m_byte_tracking_enabled
> +           = setup_live_bytes_from_ref (&ref, m_live_bytes);
> +         enum dse_store_status store_status;
> +         store_status = dse_classify_store (&ref, stmt, &use_stmt,
> +                                            m_byte_tracking_enabled,
> +                                            m_live_bytes);
> +         if (store_status == DSE_STORE_LIVE)
>             return;
> +
> +         if (store_status == DSE_STORE_MAYBE_PARTIAL_DEAD)
> +           {
> +             maybe_trim_partially_dead_store (&ref, m_live_bytes, stmt);
> +             return;
> +           }
>         }
>
>        /* Now we know that use_stmt kills the LHS of stmt.  */
> @@ -290,35 +603,10 @@ dse_optimize_stmt (gimple_stmt_iterator *gsi)
>           && !gimple_clobber_p (use_stmt))
>         return;
>
> -      if (dump_file && (dump_flags & TDF_DETAILS))
> -       {
> -         fprintf (dump_file, "  Deleted dead store: ");
> -         print_gimple_stmt (dump_file, gsi_stmt (*gsi), dump_flags, 0);
> -         fprintf (dump_file, "\n");
> -       }
> -
> -      /* Then we need to fix the operand of the consuming stmt.  */
> -      unlink_stmt_vdef (stmt);
> -
> -      /* Remove the dead store.  */
> -      basic_block bb = gimple_bb (stmt);
> -      if (gsi_remove (gsi, true))
> -       bitmap_set_bit (need_eh_cleanup, bb->index);
> -
> -      /* And release any SSA_NAMEs set in this statement back to the
> -        SSA_NAME manager.  */
> -      release_defs (stmt);
> +      delete_dead_assignment (stmt);
>      }
>  }
>
> -class dse_dom_walker : public dom_walker
> -{
> -public:
> -  dse_dom_walker (cdi_direction direction) : dom_walker (direction) {}
> -
> -  virtual edge before_dom_children (basic_block);
> -};
> -
>  edge
>  dse_dom_walker::before_dom_children (basic_block bb)
>  {
> @@ -391,7 +679,7 @@ pass_dse::execute (function *fun)
>      }
>
>    BITMAP_FREE (need_eh_cleanup);
> -
> +
>    /* For now, just wipe the post-dominator information.  */
>    free_dominance_info (CDI_POST_DOMINATORS);
>    return 0;
>
Richard Biener Jan. 4, 2017, 1:23 p.m. UTC | #2
On Wed, Jan 4, 2017 at 2:22 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Thu, Dec 22, 2016 at 7:26 AM, Jeff Law <law@redhat.com> wrote:
>> This is the first of the 4 part patchkit to address deficiencies in our DSE
>> implementation.
>>
>> This patch addresses the P2 regression 33562 which has been a low priority
>> regression since gcc-4.3.  To summarize, DSE no longer has the ability to
>> detect an aggregate store as dead if subsequent stores are done in a
>> piecemeal fashion.
>>
>> I originally tackled this by changing how we lower complex objects. That was
>> sufficient to address 33562, but was reasonably rejected.
>>
>> This version attacks the problem by improving DSE to track stores to memory
>> at a byte level.  That allows us to determine if a series of stores
>> completely covers an earlier store (thus making the earlier store dead).
>>
>> A useful side effect of this is we can detect when parts of a store are dead
>> and potentially rewrite the store.  This patch implements that for complex
>> object initializations.  While not strictly part of 33562, it's so closely
>> related that I felt it belongs as part of this patch.
>>
>> This originally limited the size of the tracked memory space to 64 bytes.  I
>> bumped the limit after working through the CONSTRUCTOR and mem* trimming
>> patches.  The 256 byte limit is still fairly arbitrary and I wouldn't lose
>> sleep if we throttled back to 64 or 128 bytes.
>>
>> Later patches in the kit will build upon this patch.  So if pieces look like
>> skeleton code, that's because it is.
>>
>> The changes since the V2 patch are:
>>
>> 1. Using sbitmaps rather than bitmaps.
>> 2. Returning a tri-state from dse_classify_store (renamed from
>> dse_possible_dead_store_p)
>> 3. More efficient trim computation
>> 4. Moving trimming code out of dse_classify_store
>> 5. Refactoring code to delete dead calls/assignments
>> 6. dse_optimize_stmt moves into the dse_dom_walker class
>>
>> Not surprisingly, this patch has most of the changes based on prior feedback
>> as it includes the raw infrastructure.
>>
>> Bootstrapped and regression tested on x86_64-linux-gnu.  OK for the trunk?
>
> New functions in sbitmap.c lack function comments.
>
> bitmap_count_bits fails to guard against GCC_VERSION >= 3400 (the version
> is not important, but non-GCC host compilers are).  See bitmap.c for a
> fallback.
>
> Both bitmap_clear_range and bitmap_set_range look rather inefficient...
> (it's not likely anybody will clean this up after you)
>
> I'd say split out the sbitmap.[ch] changes.
>
> +DEFPARAM(PARAM_DSE_MAX_OBJECT_SIZE,
> +        "dse-max-object-size",
> +        "Maximum size (in bytes) of objects tracked by dead store
> elimination.",
> +        256, 0, 0)
>
> the docs suggest that DSE doesn't handle larger stores but it does (just in
> the original limited way).  Maybe "tracked bytewise" is better.

Oh, and new --params need documeting in invoke.texi.

Richard.

> +static bool
> +valid_ao_ref_for_dse (ao_ref *ref)
> +{
> +  return (ao_ref_base (ref)
> +         && ref->max_size != -1
> +         && (ref->offset % BITS_PER_UNIT) == 0
> +         && (ref->size % BITS_PER_UNIT) == 0
> +         && (ref->size / BITS_PER_UNIT) > 0);
>
> I think the last test is better written as ref->size != -1.
>
> Btw, seeing you discount non-byte size/offset stores this somehow asks
> for store-merging being done before the last DSE (it currently runs after).
> Sth to keep in mind for GCC 8.
>
> +/* Delete a dead store STMT, which is mem* call of some kind.  */
>
> call STMT
>
> +static void
> +delete_dead_call (gimple *stmt)
> +{
> +
> excess vertical space
> ......
> +  if (lhs)
> +    {
> +      tree ptr = gimple_call_arg (stmt, 0);
> +      gimple *new_stmt = gimple_build_assign (lhs, ptr);
> +      unlink_stmt_vdef (stmt);
> +      if (gsi_replace (&gsi, new_stmt, true))
> +        bitmap_set_bit (need_eh_cleanup, gimple_bb (stmt)->index);
>
>   release_ssa_name (gimple_vdef (stmt));
>
>
> +  { m_live_bytes = sbitmap_alloc (PARAM_VALUE
> (PARAM_DSE_MAX_OBJECT_SIZE));m_byte_tracking_enabled = false; }
>
> formatting.
>
> The DSE parts look good to me with the nits above fixed.  Just re-spin
> the sbitmap.[ch] part please.
>
> Thanks,
> Richard.
>
>>
>>         PR tree-optimization/33562
>>         * params.def (PARM_DSE_MAX_OBJECT_SIZE): New PARAM.
>>         * sbitmap.h (bitmap_clear_range, bitmap_set_range): Prototype new
>>         functions.
>>         (bitmap_count_bits): Likewise.
>>         * sbitmap.c (bitmap_clear_range, bitmap_set_range): New functions.
>>         (bitmap_count_bits): Likewise.
>>         * tree-ssa-dse.c: Include params.h.
>>         (dse_store_status): New enum.
>>         (initialize_ao_ref_for_dse): New, partially extracted from
>>         dse_optimize_stmt.
>>         (valid_ao_ref_for_dse, normalize_ref): New.
>>         (setup_live_bytes_from_ref, compute_trims): Likewise.
>>         (clear_bytes_written_by, trim_complex_store): Likewise.
>>         (maybe_trim_partially_dead_store): Likewise.
>>         (maybe_trim_complex_store): Likewise.
>>         (dse_classify_store): Renamed from dse_possibly_dead_store_p.
>>         Track what bytes live from the original store.  Return tri-state
>>         for dead, partially dead or live.
>>         (dse_dom_walker): Add constructor, destructor and new private
>> members.
>>         (delete_dead_call, delete_dead_assignment): New extracted from
>>         dse_optimize_stmt.
>>         (dse_optimize_stmt): Make a member of dse_dom_walker.
>>         Use initialize_ao_ref_for_dse.
>>
>>
>>         * gcc.dg/tree-ssa/complex-4.c: No longer xfailed.
>>         * gcc.dg/tree-ssa/complex-5.c: Likewise.
>>         * gcc.dg/tree-ssa/ssa-dse-9.c: Likewise.
>>         * gcc.dg/tree-ssa/ssa-dse-18.c: New test.
>>         * gcc.dg/tree-ssa/ssa-dse-19.c: Likewise.
>>         * gcc.dg/tree-ssa/ssa-dse-20.c: Likewise.
>>         * gcc.dg/tree-ssa/ssa-dse-21.c: Likewise.
>>
>> diff --git a/gcc/params.def b/gcc/params.def
>> index 50f75a7..f367c1d 100644
>> --- a/gcc/params.def
>> +++ b/gcc/params.def
>> @@ -532,6 +532,11 @@ DEFPARAM(PARAM_AVG_LOOP_NITER,
>>          "Average number of iterations of a loop.",
>>          10, 1, 0)
>>
>> +DEFPARAM(PARAM_DSE_MAX_OBJECT_SIZE,
>> +        "dse-max-object-size",
>> +        "Maximum size (in bytes) of objects tracked by dead store
>> elimination.",
>> +        256, 0, 0)
>> +
>>  DEFPARAM(PARAM_SCEV_MAX_EXPR_SIZE,
>>          "scev-max-expr-size",
>>          "Bound on size of expressions used in the scalar evolutions
>> analyzer.",
>> diff --git a/gcc/sbitmap.c b/gcc/sbitmap.c
>> index 10b4347..2b66a6c 100644
>> --- a/gcc/sbitmap.c
>> +++ b/gcc/sbitmap.c
>> @@ -202,6 +202,39 @@ bitmap_empty_p (const_sbitmap bmap)
>>    return true;
>>  }
>>
>> +void
>> +bitmap_clear_range (sbitmap bmap, unsigned int start, unsigned int count)
>> +{
>> +  for (unsigned int i = start; i < start + count; i++)
>> +    bitmap_clear_bit (bmap, i);
>> +}
>> +
>> +void
>> +bitmap_set_range (sbitmap bmap, unsigned int start, unsigned int count)
>> +{
>> +  for (unsigned int i = start; i < start + count; i++)
>> +    bitmap_set_bit (bmap, i);
>> +}
>> +
>> +
>> +unsigned int
>> +bitmap_count_bits (const_sbitmap bmap)
>> +{
>> +  unsigned int count = 0;
>> +  for (unsigned int i = 0; i < bmap->size; i++)
>> +    if (bmap->elms[i])
>> +      {
>> +# if HOST_BITS_PER_WIDEST_FAST_INT == HOST_BITS_PER_LONG
>> +       count += __builtin_popcountl (bmap->elms[i]);
>> +# elif HOST_BITS_PER_WIDEST_FAST_INT == HOST_BITS_PER_LONGLONG
>> +       count += __builtin_popcountll (bmap->elms[i]);
>> +# else
>> +       count += __builtin_popcount (bmap->elms[i]);
>> +# endif
>> +      }
>> +  return count;
>> +}
>> +
>>
>>  /* Zero all elements in a bitmap.  */
>>
>> diff --git a/gcc/sbitmap.h b/gcc/sbitmap.h
>> index bd734f9..aa43d0d 100644
>> --- a/gcc/sbitmap.h
>> +++ b/gcc/sbitmap.h
>> @@ -231,8 +231,11 @@ extern sbitmap *sbitmap_vector_alloc (unsigned int,
>> unsigned int);
>>  extern sbitmap sbitmap_resize (sbitmap, unsigned int, int);
>>  extern void bitmap_copy (sbitmap, const_sbitmap);
>>  extern int bitmap_equal_p (const_sbitmap, const_sbitmap);
>> +extern unsigned int bitmap_count_bits (const_sbitmap);
>>  extern bool bitmap_empty_p (const_sbitmap);
>>  extern void bitmap_clear (sbitmap);
>> +extern void bitmap_clear_range (sbitmap, unsigned, unsigned);
>> +extern void bitmap_set_range (sbitmap, unsigned, unsigned);
>>  extern void bitmap_ones (sbitmap);
>>  extern void bitmap_vector_clear (sbitmap *, unsigned int);
>>  extern void bitmap_vector_ones (sbitmap *, unsigned int);
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/complex-4.c
>> b/gcc/testsuite/gcc.dg/tree-ssa/complex-4.c
>> index 87a2638..3155741 100644
>> --- a/gcc/testsuite/gcc.dg/tree-ssa/complex-4.c
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/complex-4.c
>> @@ -10,4 +10,4 @@ int f(void)
>>    return g(&t);
>>  }
>>
>> -/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" { xfail
>> *-*-* } } } */
>> +/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" } } */
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/complex-5.c
>> b/gcc/testsuite/gcc.dg/tree-ssa/complex-5.c
>> index e2cd403..e6d027f 100644
>> --- a/gcc/testsuite/gcc.dg/tree-ssa/complex-5.c
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/complex-5.c
>> @@ -8,4 +8,4 @@ int f(void)
>>   __imag__ t = 2;
>>  }
>>
>> -/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" { xfail
>> *-*-* } } } */
>> +/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" } } */
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-18.c
>> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-18.c
>> new file mode 100644
>> index 0000000..92b2df8
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-18.c
>> @@ -0,0 +1,15 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -fdump-tree-optimized" } */
>> +int g(_Complex int*);
>> +int f(void)
>> +{
>> +  _Complex int t = 0;
>> +  int i, j;
>> + __imag__ t += 2;
>> +  return g(&t);
>> +}
>> +
>> +
>> +/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" } } */
>> +/* { dg-final { scan-tree-dump-times "REALPART_EXPR" 1 "optimized" } } */
>> +/* { dg-final { scan-tree-dump-times "IMAGPART_EXPR" 1 "optimized" } } */
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-19.c
>> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-19.c
>> new file mode 100644
>> index 0000000..718b746
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-19.c
>> @@ -0,0 +1,15 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -fdump-tree-optimized" } */
>> +int g(_Complex int*);
>> +int f(void)
>> +{
>> +  _Complex int t = 0;
>> +  int i, j;
>> + __real__ t += 2;
>> +  return g(&t);
>> +}
>> +
>> +
>> +/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" } } */
>> +/* { dg-final { scan-tree-dump-times "REALPART_EXPR" 1 "optimized" } } */
>> +/* { dg-final { scan-tree-dump-times "IMAGPART_EXPR" 1 "optimized" } } */
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-20.c
>> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-20.c
>> new file mode 100644
>> index 0000000..4e14d9b
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-20.c
>> @@ -0,0 +1,13 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O1 -fno-tree-dce -fdump-tree-optimized" } */
>> +_Complex int t = 0;
>> +int f(void)
>> +{
>> +  t = 0;
>> + __imag__ t = 2;
>> +}
>> +
>> +/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" } } */
>> +/* { dg-final { scan-tree-dump-times "REALPART_EXPR" 1 "optimized" } } */
>> +/* { dg-final { scan-tree-dump-times "IMAGPART_EXPR" 1 "optimized" } } */
>> +
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-21.c
>> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-21.c
>> new file mode 100644
>> index 0000000..d1e0b85
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-21.c
>> @@ -0,0 +1,12 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O1 -fno-tree-dce -fdump-tree-optimized" } */
>> +_Complex int t = 0;
>> +int f(void)
>> +{
>> +  t = 0;
>> + __real__ t = 2;
>> +}
>> +
>> +/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" } } */
>> +/* { dg-final { scan-tree-dump-times "REALPART_EXPR" 1 "optimized" } } */
>> +/* { dg-final { scan-tree-dump-times "IMAGPART_EXPR" 1 "optimized" } } */
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-9.c
>> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-9.c
>> index 594c20c..ae48ddd 100644
>> --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-9.c
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-9.c
>> @@ -11,4 +11,4 @@ foo ()
>>  }
>>
>>  /* We should eliminate the first assignment.  */
>> -/* { dg-final { scan-tree-dump-times "VDEF" 2 "dse1" { xfail *-*-* } } } */
>> +/* { dg-final { scan-tree-dump-times "VDEF" 2 "dse1" } } */
>> diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c
>> index 778b363..40acd83 100644
>> --- a/gcc/tree-ssa-dse.c
>> +++ b/gcc/tree-ssa-dse.c
>> @@ -33,6 +33,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "tree-dfa.h"
>>  #include "domwalk.h"
>>  #include "tree-cfgcleanup.h"
>> +#include "params.h"
>>
>>  /* This file implements dead store elimination.
>>
>> @@ -68,14 +69,242 @@ along with GCC; see the file COPYING3.  If not see
>>     remove their dead edges eventually.  */
>>  static bitmap need_eh_cleanup;
>>
>> +/* Return value from dse_classify_store */
>> +enum dse_store_status
>> +{
>> +  DSE_STORE_LIVE,
>> +  DSE_STORE_MAYBE_PARTIAL_DEAD,
>> +  DSE_STORE_DEAD
>> +};
>> +
>> +/* STMT is a statement that may write into memory.  Analyze it and
>> +   initialize WRITE to describe how STMT affects memory.
>> +
>> +   Return TRUE if the the statement was analyzed, FALSE otherwise.
>> +
>> +   It is always safe to return FALSE.  But typically better optimziation
>> +   can be achieved by analyzing more statements.  */
>> +
>> +static bool
>> +initialize_ao_ref_for_dse (gimple *stmt, ao_ref *write)
>> +{
>> +  /* It's advantageous to handle certain mem* functions.  */
>> +  if (gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
>> +    {
>> +      switch (DECL_FUNCTION_CODE (gimple_call_fndecl (stmt)))
>> +       {
>> +         case BUILT_IN_MEMCPY:
>> +         case BUILT_IN_MEMMOVE:
>> +         case BUILT_IN_MEMSET:
>> +           {
>> +             tree size = NULL_TREE;
>> +             if (gimple_call_num_args (stmt) == 3)
>> +               size = gimple_call_arg (stmt, 2);
>> +             tree ptr = gimple_call_arg (stmt, 0);
>> +             ao_ref_init_from_ptr_and_size (write, ptr, size);
>> +             return true;
>> +           }
>> +         default:
>> +           break;
>> +       }
>> +    }
>> +  else if (is_gimple_assign (stmt))
>> +    {
>> +      ao_ref_init (write, gimple_assign_lhs (stmt));
>> +      return true;
>> +    }
>> +  return false;
>> +}
>> +
>> +/* Given REF from the the alias oracle, return TRUE if it is a valid
>> +   memory reference for dead store elimination, false otherwise.
>> +
>> +   In particular, the reference must have a known base, known maximum
>> +   size, start at a byte offset and have a size that is one or more
>> +   bytes.   Finally the offset and size must be smaller than
>> +   PARAM_DSE_MAX_OBJECT_SIZE.  */
>> +
>> +static bool
>> +valid_ao_ref_for_dse (ao_ref *ref)
>> +{
>> +  return (ao_ref_base (ref)
>> +         && ref->max_size != -1
>> +         && (ref->offset % BITS_PER_UNIT) == 0
>> +         && (ref->size % BITS_PER_UNIT) == 0
>> +         && (ref->size / BITS_PER_UNIT) > 0);
>> +}
>> +
>> +/* Normalize COPY (an ao_ref) relative to REF.  Essentially when we are
>> +   done COPY will only refer bytes found within REF.
>> +
>> +   We have already verified that COPY intersects at least one
>> +   byte with REF.  */
>> +
>> +static void
>> +normalize_ref (ao_ref *copy, ao_ref *ref)
>> +{
>> +  /* If COPY starts before REF, then reset the beginning of
>> +     COPY to match REF and decrease the size of COPY by the
>> +     number of bytes removed from COPY.  */
>> +  if (copy->offset < ref->offset)
>> +    {
>> +      copy->size -= (ref->offset - copy->offset);
>> +      copy->offset = ref->offset;
>> +    }
>> +
>> +  /* If COPY extends beyond REF, chop off its size appropriately.  */
>> +  if (copy->offset + copy->size > ref->offset + ref->size)
>> +    copy->size -= (copy->offset + copy->size - (ref->offset + ref->size));
>> +}
>> +
>> +/* Clear any bytes written by STMT from the bitmap LIVE_BYTES.  The base
>> +   address written by STMT must match the one found in REF, which must
>> +   have its base address previously initialized.
>> +
>> +   This routine must be conservative.  If we don't know the offset or
>> +   actual size written, assume nothing was written.  */
>> +
>> +static void
>> +clear_bytes_written_by (sbitmap live_bytes, gimple *stmt, ao_ref *ref)
>> +{
>> +  ao_ref write;
>> +  if (!initialize_ao_ref_for_dse (stmt, &write))
>> +    return;
>> +
>> +  /* Verify we have the same base memory address, the write
>> +     has a known size and overlaps with REF.  */
>> +  if (valid_ao_ref_for_dse (&write)
>> +      && write.base == ref->base
>> +      && write.size == write.max_size
>> +      && ((write.offset < ref->offset
>> +          && write.offset + write.size > ref->offset)
>> +         || (write.offset >= ref->offset
>> +             && write.offset < ref->offset + ref->size)))
>> +    {
>> +      normalize_ref (&write, ref);
>> +      bitmap_clear_range (live_bytes,
>> +                         (write.offset - ref->offset) / BITS_PER_UNIT,
>> +                         write.size / BITS_PER_UNIT);
>> +    }
>> +}
>> +
>> +/* REF is a memory write.  Extract relevant information from it and
>> +   initialize the LIVE_BYTES bitmap.  If successful, return TRUE.
>> +   Otherwise return FALSE.  */
>> +
>> +static bool
>> +setup_live_bytes_from_ref (ao_ref *ref, sbitmap live_bytes)
>> +{
>> +  if (valid_ao_ref_for_dse (ref)
>> +      && (ref->size / BITS_PER_UNIT
>> +         <= PARAM_VALUE (PARAM_DSE_MAX_OBJECT_SIZE)))
>> +    {
>> +      bitmap_clear (live_bytes);
>> +      bitmap_set_range (live_bytes, 0, ref->size / BITS_PER_UNIT);
>> +      return true;
>> +    }
>> +  return false;
>> +}
>> +
>> +/* Compute the number of elements that we can trim from the head and
>> +   tail of ORIG resulting in a bitmap that is a superset of LIVE.
>> +
>> +   Store the number of elements trimmed from the head and tail in
>> +   TRIM_HEAD and TRIM_TAIL.  */
>> +
>> +static void
>> +compute_trims (ao_ref *ref, sbitmap live, int *trim_head, int *trim_tail)
>> +{
>> +  /* We use sbitmaps biased such that ref->offset is bit zero and the
>> bitmap
>> +     extends through ref->size.  So we know that in the original bitmap
>> +     bits 0..ref->size were true.  We don't actually need the bitmap, just
>> +     the REF to compute the trims.  */
>> +
>> +  /* Now identify how much, if any of the tail we can chop off.  */
>> +  *trim_tail = 0;
>> +  int last_orig = (ref->size / BITS_PER_UNIT) - 1;
>> +  int last_live = bitmap_last_set_bit (live);
>> +  *trim_tail = (last_orig - last_live) & ~0x1;
>> +
>> +  /* Identify how much, if any of the head we can chop off.  */
>> +  int first_orig = 0;
>> +  int first_live = bitmap_first_set_bit (live);
>> +  *trim_head = (first_live - first_orig) & ~0x1;
>> +}
>> +
>> +/* STMT initializes an object from COMPLEX_CST where one or more of the
>> +   bytes written may be dead stores.  REF is a representation of the
>> +   memory written.  LIVE is the bitmap of stores that are actually live.
>> +
>> +   Attempt to rewrite STMT so that only the real or imaginary part of
>> +   the object is actually stored.  */
>> +
>> +static void
>> +maybe_trim_complex_store (ao_ref *ref, sbitmap live, gimple *stmt)
>> +{
>> +  int trim_head, trim_tail;
>> +  compute_trims (ref, live, &trim_head, &trim_tail);
>> +
>> +  /* The amount of data trimmed from the head or tail must be at
>> +     least half the size of the object to ensure we're trimming
>> +     the entire real or imaginary half.  By writing things this
>> +     way we avoid more O(n) bitmap operations.  */
>> +  if (trim_tail * 2 >= ref->size / BITS_PER_UNIT)
>> +    {
>> +      /* TREE_REALPART is live */
>> +      tree x = TREE_REALPART (gimple_assign_rhs1 (stmt));
>> +      tree y = gimple_assign_lhs (stmt);
>> +      y = build1 (REALPART_EXPR, TREE_TYPE (x), y);
>> +      gimple_assign_set_lhs (stmt, y);
>> +      gimple_assign_set_rhs1 (stmt, x);
>> +    }
>> +  else if (trim_head * 2 >= ref->size / BITS_PER_UNIT)
>> +    {
>> +      /* TREE_IMAGPART is live */
>> +      tree x = TREE_IMAGPART (gimple_assign_rhs1 (stmt));
>> +      tree y = gimple_assign_lhs (stmt);
>> +      y = build1 (IMAGPART_EXPR, TREE_TYPE (x), y);
>> +      gimple_assign_set_lhs (stmt, y);
>> +      gimple_assign_set_rhs1 (stmt, x);
>> +    }
>> +
>> +  /* Other cases indicate parts of both the real and imag subobjects
>> +     are live.  We do not try to optimize those cases.  */
>> +}
>> +
>> +/* STMT is a memory write where one or more bytes written are dead
>> +   stores.  ORIG is the bitmap of bytes stored by STMT.  LIVE is the
>> +   bitmap of stores that are actually live.
>> +
>> +   Attempt to rewrite STMT so that it writes fewer memory locations.  Right
>> +   now we only support trimming at the start or end of the memory region.
>> +   It's not clear how much there is to be gained by trimming from the
>> middle
>> +   of the region.  */
>> +
>> +static void
>> +maybe_trim_partially_dead_store (ao_ref *ref, sbitmap live, gimple *stmt)
>> +{
>> +  if (is_gimple_assign (stmt))
>> +    {
>> +      switch (gimple_assign_rhs_code (stmt))
>> +       {
>> +       case COMPLEX_CST:
>> +         maybe_trim_complex_store (ref, live, stmt);
>> +         break;
>> +       default:
>> +         break;
>> +       }
>> +    }
>> +}
>>
>>  /* A helper of dse_optimize_stmt.
>>     Given a GIMPLE_ASSIGN in STMT that writes to REF, find a candidate
>>     statement *USE_STMT that may prove STMT to be dead.
>>     Return TRUE if the above conditions are met, otherwise FALSE.  */
>>
>> -static bool
>> -dse_possible_dead_store_p (ao_ref *ref, gimple *stmt, gimple **use_stmt)
>> +static dse_store_status
>> +dse_classify_store (ao_ref *ref, gimple *stmt, gimple **use_stmt,
>> +                   bool byte_tracking_enabled, sbitmap live_bytes)
>>  {
>>    gimple *temp;
>>    unsigned cnt = 0;
>> @@ -97,7 +326,7 @@ dse_possible_dead_store_p (ao_ref *ref, gimple *stmt,
>> gimple **use_stmt)
>>        /* Limit stmt walking to be linear in the number of possibly
>>           dead stores.  */
>>        if (++cnt > 256)
>> -       return false;
>> +       return DSE_STORE_LIVE;
>>
>>        if (gimple_code (temp) == GIMPLE_PHI)
>>         defvar = PHI_RESULT (temp);
>> @@ -135,7 +364,7 @@ dse_possible_dead_store_p (ao_ref *ref, gimple *stmt,
>> gimple **use_stmt)
>>                   fail = true;
>>                   BREAK_FROM_IMM_USE_STMT (ui);
>>                 }
>> -             /* Do not consider the PHI as use if it dominates the
>> +             /* Do not consider the PHI as use if it dominates the
>>                  stmt defining the virtual operand we are processing,
>>                  we have processed it already in this case.  */
>>               if (gimple_bb (defvar_def) != gimple_bb (use_stmt)
>> @@ -164,7 +393,13 @@ dse_possible_dead_store_p (ao_ref *ref, gimple *stmt,
>> gimple **use_stmt)
>>         }
>>
>>        if (fail)
>> -       return false;
>> +       {
>> +         /* STMT might be partially dead and we may be able to reduce
>> +            how many memory locations it stores into.  */
>> +         if (byte_tracking_enabled && !gimple_clobber_p (stmt))
>> +           return DSE_STORE_MAYBE_PARTIAL_DEAD;
>> +         return DSE_STORE_LIVE;
>> +       }
>>
>>        /* If we didn't find any definition this means the store is dead
>>           if it isn't a store to global reachable memory.  In this case
>> @@ -172,20 +407,100 @@ dse_possible_dead_store_p (ao_ref *ref, gimple *stmt,
>> gimple **use_stmt)
>>        if (!temp)
>>         {
>>           if (ref_may_alias_global_p (ref))
>> -           return false;
>> +           return DSE_STORE_LIVE;
>>
>>           temp = stmt;
>>           break;
>>         }
>> +
>> +      if (byte_tracking_enabled && temp)
>> +       clear_bytes_written_by (live_bytes, temp, ref);
>>      }
>> -  /* Continue walking until we reach a kill.  */
>> -  while (!stmt_kills_ref_p (temp, ref));
>> +  /* Continue walking until we reach a full kill as a single statement
>> +     or there are no more live bytes.  */
>> +  while (!stmt_kills_ref_p (temp, ref)
>> +        && !(byte_tracking_enabled && bitmap_empty_p (live_bytes)));
>>
>>    *use_stmt = temp;
>> +  return DSE_STORE_DEAD;
>> +}
>> +
>> +
>> +class dse_dom_walker : public dom_walker
>> +{
>> +public:
>> +  dse_dom_walker (cdi_direction direction) : dom_walker (direction)
>> +  { m_live_bytes = sbitmap_alloc (PARAM_VALUE
>> (PARAM_DSE_MAX_OBJECT_SIZE));m_byte_tracking_enabled = false; }
>> +
>> +  ~dse_dom_walker () { sbitmap_free (m_live_bytes); }
>> +
>> +  virtual edge before_dom_children (basic_block);
>>
>> -  return true;
>> +private:
>> +  sbitmap m_live_bytes;
>> +  bool m_byte_tracking_enabled;
>> +  void dse_optimize_stmt (gimple_stmt_iterator *);
>> +};
>> +
>> +/* Delete a dead store STMT, which is mem* call of some kind.  */
>> +static void
>> +delete_dead_call (gimple *stmt)
>> +{
>> +
>> +  if (dump_file && (dump_flags & TDF_DETAILS))
>> +    {
>> +      fprintf (dump_file, "  Deleted dead call: ");
>> +      print_gimple_stmt (dump_file, stmt, dump_flags, 0);
>> +      fprintf (dump_file, "\n");
>> +    }
>> +
>> +  tree lhs = gimple_call_lhs (stmt);
>> +  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
>> +  if (lhs)
>> +    {
>> +      tree ptr = gimple_call_arg (stmt, 0);
>> +      gimple *new_stmt = gimple_build_assign (lhs, ptr);
>> +      unlink_stmt_vdef (stmt);
>> +      if (gsi_replace (&gsi, new_stmt, true))
>> +        bitmap_set_bit (need_eh_cleanup, gimple_bb (stmt)->index);
>> +    }
>> +  else
>> +    {
>> +      /* Then we need to fix the operand of the consuming stmt.  */
>> +      unlink_stmt_vdef (stmt);
>> +
>> +      /* Remove the dead store.  */
>> +      if (gsi_remove (&gsi, true))
>> +       bitmap_set_bit (need_eh_cleanup, gimple_bb (stmt)->index);
>> +      release_defs (stmt);
>> +    }
>>  }
>>
>> +/* Delete a dead store STMT, which is a gimple assignment. */
>> +
>> +static void
>> +delete_dead_assignment (gimple *stmt)
>> +{
>> +  if (dump_file && (dump_flags & TDF_DETAILS))
>> +    {
>> +      fprintf (dump_file, "  Deleted dead store: ");
>> +      print_gimple_stmt (dump_file, stmt, dump_flags, 0);
>> +      fprintf (dump_file, "\n");
>> +    }
>> +
>> +  /* Then we need to fix the operand of the consuming stmt.  */
>> +  unlink_stmt_vdef (stmt);
>> +
>> +  /* Remove the dead store.  */
>> +  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
>> +  basic_block bb = gimple_bb (stmt);
>> +  if (gsi_remove (&gsi, true))
>> +    bitmap_set_bit (need_eh_cleanup, bb->index);
>> +
>> +  /* And release any SSA_NAMEs set in this statement back to the
>> +     SSA_NAME manager.  */
>> +  release_defs (stmt);
>> +}
>>
>>  /* Attempt to eliminate dead stores in the statement referenced by BSI.
>>
>> @@ -198,8 +513,8 @@ dse_possible_dead_store_p (ao_ref *ref, gimple *stmt,
>> gimple **use_stmt)
>>     is used precisely once by a later store to the same location which
>>     post dominates the first store, then the first store is dead.  */
>>
>> -static void
>> -dse_optimize_stmt (gimple_stmt_iterator *gsi)
>> +void
>> +dse_dom_walker::dse_optimize_stmt (gimple_stmt_iterator *gsi)
>>  {
>>    gimple *stmt = gsi_stmt (*gsi);
>>
>> @@ -214,6 +529,10 @@ dse_optimize_stmt (gimple_stmt_iterator *gsi)
>>           || TREE_CODE (gimple_assign_lhs (stmt)) != MEM_REF))
>>      return;
>>
>> +  ao_ref ref;
>> +  if (!initialize_ao_ref_for_dse (stmt, &ref))
>> +    return;
>> +
>>    /* We know we have virtual definitions.  We can handle assignments and
>>       some builtin calls.  */
>>    if (gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
>> @@ -225,42 +544,26 @@ dse_optimize_stmt (gimple_stmt_iterator *gsi)
>>           case BUILT_IN_MEMSET:
>>             {
>>               gimple *use_stmt;
>> -             ao_ref ref;
>> -             tree size = NULL_TREE;
>> -             if (gimple_call_num_args (stmt) == 3)
>> -               size = gimple_call_arg (stmt, 2);
>> -             tree ptr = gimple_call_arg (stmt, 0);
>> -             ao_ref_init_from_ptr_and_size (&ref, ptr, size);
>> -             if (!dse_possible_dead_store_p (&ref, stmt, &use_stmt))
>> +             enum dse_store_status store_status;
>> +             m_byte_tracking_enabled
>> +               = setup_live_bytes_from_ref (&ref, m_live_bytes);
>> +             store_status = dse_classify_store (&ref, stmt, &use_stmt,
>> +                                                m_byte_tracking_enabled,
>> +                                                m_live_bytes);
>> +             if (store_status == DSE_STORE_LIVE)
>>                 return;
>>
>> -             if (dump_file && (dump_flags & TDF_DETAILS))
>> -               {
>> -                 fprintf (dump_file, "  Deleted dead call: ");
>> -                 print_gimple_stmt (dump_file, gsi_stmt (*gsi), dump_flags,
>> 0);
>> -                 fprintf (dump_file, "\n");
>> -               }
>> -
>> -             tree lhs = gimple_call_lhs (stmt);
>> -             if (lhs)
>> +             if (store_status == DSE_STORE_MAYBE_PARTIAL_DEAD)
>>                 {
>> -                 gimple *new_stmt = gimple_build_assign (lhs, ptr);
>> -                 unlink_stmt_vdef (stmt);
>> -                 if (gsi_replace (gsi, new_stmt, true))
>> -                   bitmap_set_bit (need_eh_cleanup, gimple_bb
>> (stmt)->index);
>> +                 maybe_trim_partially_dead_store (&ref, m_live_bytes,
>> stmt);
>> +                 return;
>>                 }
>> -             else
>> -               {
>> -                 /* Then we need to fix the operand of the consuming stmt.
>> */
>> -                 unlink_stmt_vdef (stmt);
>>
>> -                 /* Remove the dead store.  */
>> -                 if (gsi_remove (gsi, true))
>> -                   bitmap_set_bit (need_eh_cleanup, gimple_bb
>> (stmt)->index);
>> -                 release_defs (stmt);
>> -               }
>> -             break;
>> +             if (store_status == DSE_STORE_DEAD)
>> +               delete_dead_call (stmt);
>> +             return;
>>             }
>> +
>>           default:
>>             return;
>>         }
>> @@ -276,10 +579,20 @@ dse_optimize_stmt (gimple_stmt_iterator *gsi)
>>         use_stmt = stmt;
>>        else
>>         {
>> -         ao_ref ref;
>> -         ao_ref_init (&ref, gimple_assign_lhs (stmt));
>> -         if (!dse_possible_dead_store_p (&ref, stmt, &use_stmt))
>> +         m_byte_tracking_enabled
>> +           = setup_live_bytes_from_ref (&ref, m_live_bytes);
>> +         enum dse_store_status store_status;
>> +         store_status = dse_classify_store (&ref, stmt, &use_stmt,
>> +                                            m_byte_tracking_enabled,
>> +                                            m_live_bytes);
>> +         if (store_status == DSE_STORE_LIVE)
>>             return;
>> +
>> +         if (store_status == DSE_STORE_MAYBE_PARTIAL_DEAD)
>> +           {
>> +             maybe_trim_partially_dead_store (&ref, m_live_bytes, stmt);
>> +             return;
>> +           }
>>         }
>>
>>        /* Now we know that use_stmt kills the LHS of stmt.  */
>> @@ -290,35 +603,10 @@ dse_optimize_stmt (gimple_stmt_iterator *gsi)
>>           && !gimple_clobber_p (use_stmt))
>>         return;
>>
>> -      if (dump_file && (dump_flags & TDF_DETAILS))
>> -       {
>> -         fprintf (dump_file, "  Deleted dead store: ");
>> -         print_gimple_stmt (dump_file, gsi_stmt (*gsi), dump_flags, 0);
>> -         fprintf (dump_file, "\n");
>> -       }
>> -
>> -      /* Then we need to fix the operand of the consuming stmt.  */
>> -      unlink_stmt_vdef (stmt);
>> -
>> -      /* Remove the dead store.  */
>> -      basic_block bb = gimple_bb (stmt);
>> -      if (gsi_remove (gsi, true))
>> -       bitmap_set_bit (need_eh_cleanup, bb->index);
>> -
>> -      /* And release any SSA_NAMEs set in this statement back to the
>> -        SSA_NAME manager.  */
>> -      release_defs (stmt);
>> +      delete_dead_assignment (stmt);
>>      }
>>  }
>>
>> -class dse_dom_walker : public dom_walker
>> -{
>> -public:
>> -  dse_dom_walker (cdi_direction direction) : dom_walker (direction) {}
>> -
>> -  virtual edge before_dom_children (basic_block);
>> -};
>> -
>>  edge
>>  dse_dom_walker::before_dom_children (basic_block bb)
>>  {
>> @@ -391,7 +679,7 @@ pass_dse::execute (function *fun)
>>      }
>>
>>    BITMAP_FREE (need_eh_cleanup);
>> -
>> +
>>    /* For now, just wipe the post-dominator information.  */
>>    free_dominance_info (CDI_POST_DOMINATORS);
>>    return 0;
>>
Jeff Law Jan. 12, 2017, 4:50 a.m. UTC | #3
On 01/04/2017 06:22 AM, Richard Biener wrote:
>>>
>> Bootstrapped and regression tested on x86_64-linux-gnu.  OK for the trunk?
>
> New functions in sbitmap.c lack function comments.
Bah.  Sophomoric on my part.  Fixed.

>
> bitmap_count_bits fails to guard against GCC_VERSION >= 3400 (the version
> is not important, but non-GCC host compilers are).  See bitmap.c for a
> fallback.
Mistake on my part.  I keep thinking we support starting the bootstrap 
process with the most recently released GCC, but we support 3.4 as well 
as other C++98/C++03 compilers.  Fixed in the next update (and tested by 
forcing the fallback method).

>
> Both bitmap_clear_range and bitmap_set_range look rather inefficient...
> (it's not likely anybody will clean this up after you)
They were, but not anymore.  Now they build a mask to deal with any 
partial clearing/setting in the first word, then a single memset for any 
whole words in the middle, then another masking operation on residuals 
in the last word.  Verified behavior by by keeping two bitmaps, one with 
the old slow approach and one with the faster implementation and 
checking for equality.  Obviously those verification bits won't be in 
the final patch.

> I'd say split out the sbitmap.[ch] changes.
Sure.  That's easy enough.

>
> +DEFPARAM(PARAM_DSE_MAX_OBJECT_SIZE,
> +        "dse-max-object-size",
> +        "Maximum size (in bytes) of objects tracked by dead store
> elimination.",
> +        256, 0, 0)
>
> the docs suggest that DSE doesn't handle larger stores but it does (just in
> the original limited way).  Maybe "tracked bytewise" is better.
Agreed and fixed.


>
> +static bool
> +valid_ao_ref_for_dse (ao_ref *ref)
> +{
> +  return (ao_ref_base (ref)
> +         && ref->max_size != -1
> +         && (ref->offset % BITS_PER_UNIT) == 0
> +         && (ref->size % BITS_PER_UNIT) == 0
> +         && (ref->size / BITS_PER_UNIT) > 0);
>
> I think the last test is better written as ref->size != -1.
Seems reasonable.  Fixed.


>
> Btw, seeing you discount non-byte size/offset stores this somehow asks
> for store-merging being done before the last DSE (it currently runs after).
> Sth to keep in mind for GCC 8.
Yea, probably.  Of course it may also be the case that DSE enables store 
merging.  Worth some experimentation.


>
> +/* Delete a dead store STMT, which is mem* call of some kind.  */
>
> call STMT
Fixed.

>
> +static void
> +delete_dead_call (gimple *stmt)
> +{
> +
> excess vertical space
Likewise.

> ......
> +  if (lhs)
> +    {
> +      tree ptr = gimple_call_arg (stmt, 0);
> +      gimple *new_stmt = gimple_build_assign (lhs, ptr);
> +      unlink_stmt_vdef (stmt);
> +      if (gsi_replace (&gsi, new_stmt, true))
> +        bitmap_set_bit (need_eh_cleanup, gimple_bb (stmt)->index);
>
>   release_ssa_name (gimple_vdef (stmt));
>
>
> +  { m_live_bytes = sbitmap_alloc (PARAM_VALUE
> (PARAM_DSE_MAX_OBJECT_SIZE));m_byte_tracking_enabled = false; }
>
> formatting.
Yea. Fixed.

>
> The DSE parts look good to me with the nits above fixed.  Just re-spin
> the sbitmap.[ch] part please.
Will repost the sbitmap.c bits after retesting the series.

jeff
Jeff Law Jan. 12, 2017, 4:55 a.m. UTC | #4
On 01/04/2017 06:23 AM, Richard Biener wrote:
> On Wed, Jan 4, 2017 at 2:22 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Thu, Dec 22, 2016 at 7:26 AM, Jeff Law <law@redhat.com> wrote:
>>> This is the first of the 4 part patchkit to address deficiencies in our DSE
>>> implementation.
>>>
>>> This patch addresses the P2 regression 33562 which has been a low priority
>>> regression since gcc-4.3.  To summarize, DSE no longer has the ability to
>>> detect an aggregate store as dead if subsequent stores are done in a
>>> piecemeal fashion.
>>>
>>> I originally tackled this by changing how we lower complex objects. That was
>>> sufficient to address 33562, but was reasonably rejected.
>>>
>>> This version attacks the problem by improving DSE to track stores to memory
>>> at a byte level.  That allows us to determine if a series of stores
>>> completely covers an earlier store (thus making the earlier store dead).
>>>
>>> A useful side effect of this is we can detect when parts of a store are dead
>>> and potentially rewrite the store.  This patch implements that for complex
>>> object initializations.  While not strictly part of 33562, it's so closely
>>> related that I felt it belongs as part of this patch.
>>>
>>> This originally limited the size of the tracked memory space to 64 bytes.  I
>>> bumped the limit after working through the CONSTRUCTOR and mem* trimming
>>> patches.  The 256 byte limit is still fairly arbitrary and I wouldn't lose
>>> sleep if we throttled back to 64 or 128 bytes.
>>>
>>> Later patches in the kit will build upon this patch.  So if pieces look like
>>> skeleton code, that's because it is.
>>>
>>> The changes since the V2 patch are:
>>>
>>> 1. Using sbitmaps rather than bitmaps.
>>> 2. Returning a tri-state from dse_classify_store (renamed from
>>> dse_possible_dead_store_p)
>>> 3. More efficient trim computation
>>> 4. Moving trimming code out of dse_classify_store
>>> 5. Refactoring code to delete dead calls/assignments
>>> 6. dse_optimize_stmt moves into the dse_dom_walker class
>>>
>>> Not surprisingly, this patch has most of the changes based on prior feedback
>>> as it includes the raw infrastructure.
>>>
>>> Bootstrapped and regression tested on x86_64-linux-gnu.  OK for the trunk?
>>
>> New functions in sbitmap.c lack function comments.
>>
>> bitmap_count_bits fails to guard against GCC_VERSION >= 3400 (the version
>> is not important, but non-GCC host compilers are).  See bitmap.c for a
>> fallback.
>>
>> Both bitmap_clear_range and bitmap_set_range look rather inefficient...
>> (it's not likely anybody will clean this up after you)
>>
>> I'd say split out the sbitmap.[ch] changes.
>>
>> +DEFPARAM(PARAM_DSE_MAX_OBJECT_SIZE,
>> +        "dse-max-object-size",
>> +        "Maximum size (in bytes) of objects tracked by dead store
>> elimination.",
>> +        256, 0, 0)
>>
>> the docs suggest that DSE doesn't handle larger stores but it does (just in
>> the original limited way).  Maybe "tracked bytewise" is better.
>
> Oh, and new --params need documeting in invoke.texi.
Fixed.

jeff
diff mbox

Patch

diff --git a/gcc/params.def b/gcc/params.def
index 50f75a7..f367c1d 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -532,6 +532,11 @@  DEFPARAM(PARAM_AVG_LOOP_NITER,
 	 "Average number of iterations of a loop.",
 	 10, 1, 0)
 
+DEFPARAM(PARAM_DSE_MAX_OBJECT_SIZE,
+	 "dse-max-object-size",
+	 "Maximum size (in bytes) of objects tracked by dead store elimination.",
+	 256, 0, 0)
+
 DEFPARAM(PARAM_SCEV_MAX_EXPR_SIZE,
  	 "scev-max-expr-size",
 	 "Bound on size of expressions used in the scalar evolutions analyzer.",
diff --git a/gcc/sbitmap.c b/gcc/sbitmap.c
index 10b4347..2b66a6c 100644
--- a/gcc/sbitmap.c
+++ b/gcc/sbitmap.c
@@ -202,6 +202,39 @@  bitmap_empty_p (const_sbitmap bmap)
   return true;
 }
 
+void
+bitmap_clear_range (sbitmap bmap, unsigned int start, unsigned int count)
+{
+  for (unsigned int i = start; i < start + count; i++)
+    bitmap_clear_bit (bmap, i);
+}
+
+void
+bitmap_set_range (sbitmap bmap, unsigned int start, unsigned int count)
+{
+  for (unsigned int i = start; i < start + count; i++)
+    bitmap_set_bit (bmap, i);
+}
+
+
+unsigned int
+bitmap_count_bits (const_sbitmap bmap)
+{
+  unsigned int count = 0;
+  for (unsigned int i = 0; i < bmap->size; i++)
+    if (bmap->elms[i])
+      {
+# if HOST_BITS_PER_WIDEST_FAST_INT == HOST_BITS_PER_LONG
+	count += __builtin_popcountl (bmap->elms[i]);
+# elif HOST_BITS_PER_WIDEST_FAST_INT == HOST_BITS_PER_LONGLONG
+	count += __builtin_popcountll (bmap->elms[i]);
+# else
+	count += __builtin_popcount (bmap->elms[i]);
+# endif
+      }
+  return count;
+}
+
 
 /* Zero all elements in a bitmap.  */
 
diff --git a/gcc/sbitmap.h b/gcc/sbitmap.h
index bd734f9..aa43d0d 100644
--- a/gcc/sbitmap.h
+++ b/gcc/sbitmap.h
@@ -231,8 +231,11 @@  extern sbitmap *sbitmap_vector_alloc (unsigned int, unsigned int);
 extern sbitmap sbitmap_resize (sbitmap, unsigned int, int);
 extern void bitmap_copy (sbitmap, const_sbitmap);
 extern int bitmap_equal_p (const_sbitmap, const_sbitmap);
+extern unsigned int bitmap_count_bits (const_sbitmap);
 extern bool bitmap_empty_p (const_sbitmap);
 extern void bitmap_clear (sbitmap);
+extern void bitmap_clear_range (sbitmap, unsigned, unsigned);
+extern void bitmap_set_range (sbitmap, unsigned, unsigned);
 extern void bitmap_ones (sbitmap);
 extern void bitmap_vector_clear (sbitmap *, unsigned int);
 extern void bitmap_vector_ones (sbitmap *, unsigned int);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/complex-4.c b/gcc/testsuite/gcc.dg/tree-ssa/complex-4.c
index 87a2638..3155741 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/complex-4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/complex-4.c
@@ -10,4 +10,4 @@  int f(void)
   return g(&t);
 }
 
-/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/complex-5.c b/gcc/testsuite/gcc.dg/tree-ssa/complex-5.c
index e2cd403..e6d027f 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/complex-5.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/complex-5.c
@@ -8,4 +8,4 @@  int f(void)
  __imag__ t = 2;
 }
 
-/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-18.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-18.c
new file mode 100644
index 0000000..92b2df8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-18.c
@@ -0,0 +1,15 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+int g(_Complex int*);
+int f(void)
+{
+  _Complex int t = 0;
+  int i, j;
+ __imag__ t += 2;
+  return g(&t);
+}
+
+
+/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "REALPART_EXPR" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "IMAGPART_EXPR" 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-19.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-19.c
new file mode 100644
index 0000000..718b746
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-19.c
@@ -0,0 +1,15 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+int g(_Complex int*);
+int f(void)
+{
+  _Complex int t = 0;
+  int i, j;
+ __real__ t += 2;
+  return g(&t);
+}
+
+
+/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "REALPART_EXPR" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "IMAGPART_EXPR" 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-20.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-20.c
new file mode 100644
index 0000000..4e14d9b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-20.c
@@ -0,0 +1,13 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O1 -fno-tree-dce -fdump-tree-optimized" } */
+_Complex int t = 0;
+int f(void)
+{
+  t = 0;
+ __imag__ t = 2;
+}
+
+/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "REALPART_EXPR" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "IMAGPART_EXPR" 1 "optimized" } } */
+
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-21.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-21.c
new file mode 100644
index 0000000..d1e0b85
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-21.c
@@ -0,0 +1,12 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O1 -fno-tree-dce -fdump-tree-optimized" } */
+_Complex int t = 0;
+int f(void)
+{
+  t = 0;
+ __real__ t = 2;
+}
+
+/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "REALPART_EXPR" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "IMAGPART_EXPR" 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-9.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-9.c
index 594c20c..ae48ddd 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-9.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-9.c
@@ -11,4 +11,4 @@  foo ()
 }
 
 /* We should eliminate the first assignment.  */
-/* { dg-final { scan-tree-dump-times "VDEF" 2 "dse1" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "VDEF" 2 "dse1" } } */
diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c
index 778b363..40acd83 100644
--- a/gcc/tree-ssa-dse.c
+++ b/gcc/tree-ssa-dse.c
@@ -33,6 +33,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "tree-dfa.h"
 #include "domwalk.h"
 #include "tree-cfgcleanup.h"
+#include "params.h"
 
 /* This file implements dead store elimination.
 
@@ -68,14 +69,242 @@  along with GCC; see the file COPYING3.  If not see
    remove their dead edges eventually.  */
 static bitmap need_eh_cleanup;
 
+/* Return value from dse_classify_store */
+enum dse_store_status
+{
+  DSE_STORE_LIVE,
+  DSE_STORE_MAYBE_PARTIAL_DEAD,
+  DSE_STORE_DEAD
+};
+
+/* STMT is a statement that may write into memory.  Analyze it and
+   initialize WRITE to describe how STMT affects memory.
+
+   Return TRUE if the the statement was analyzed, FALSE otherwise.
+
+   It is always safe to return FALSE.  But typically better optimziation
+   can be achieved by analyzing more statements.  */
+
+static bool
+initialize_ao_ref_for_dse (gimple *stmt, ao_ref *write)
+{
+  /* It's advantageous to handle certain mem* functions.  */
+  if (gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
+    {
+      switch (DECL_FUNCTION_CODE (gimple_call_fndecl (stmt)))
+	{
+	  case BUILT_IN_MEMCPY:
+	  case BUILT_IN_MEMMOVE:
+	  case BUILT_IN_MEMSET:
+	    {
+	      tree size = NULL_TREE;
+	      if (gimple_call_num_args (stmt) == 3)
+		size = gimple_call_arg (stmt, 2);
+	      tree ptr = gimple_call_arg (stmt, 0);
+	      ao_ref_init_from_ptr_and_size (write, ptr, size);
+	      return true;
+	    }
+	  default:
+	    break;
+	}
+    }
+  else if (is_gimple_assign (stmt))
+    {
+      ao_ref_init (write, gimple_assign_lhs (stmt));
+      return true;
+    }
+  return false;
+}
+
+/* Given REF from the the alias oracle, return TRUE if it is a valid
+   memory reference for dead store elimination, false otherwise.
+
+   In particular, the reference must have a known base, known maximum
+   size, start at a byte offset and have a size that is one or more
+   bytes.   Finally the offset and size must be smaller than
+   PARAM_DSE_MAX_OBJECT_SIZE.  */
+
+static bool
+valid_ao_ref_for_dse (ao_ref *ref)
+{
+  return (ao_ref_base (ref)
+	  && ref->max_size != -1
+	  && (ref->offset % BITS_PER_UNIT) == 0
+	  && (ref->size % BITS_PER_UNIT) == 0
+	  && (ref->size / BITS_PER_UNIT) > 0);
+}
+
+/* Normalize COPY (an ao_ref) relative to REF.  Essentially when we are
+   done COPY will only refer bytes found within REF.
+
+   We have already verified that COPY intersects at least one
+   byte with REF.  */
+
+static void
+normalize_ref (ao_ref *copy, ao_ref *ref)
+{
+  /* If COPY starts before REF, then reset the beginning of
+     COPY to match REF and decrease the size of COPY by the
+     number of bytes removed from COPY.  */
+  if (copy->offset < ref->offset)
+    {
+      copy->size -= (ref->offset - copy->offset);
+      copy->offset = ref->offset;
+    }
+
+  /* If COPY extends beyond REF, chop off its size appropriately.  */
+  if (copy->offset + copy->size > ref->offset + ref->size)
+    copy->size -= (copy->offset + copy->size - (ref->offset + ref->size));
+}
+
+/* Clear any bytes written by STMT from the bitmap LIVE_BYTES.  The base
+   address written by STMT must match the one found in REF, which must
+   have its base address previously initialized.
+
+   This routine must be conservative.  If we don't know the offset or
+   actual size written, assume nothing was written.  */
+
+static void
+clear_bytes_written_by (sbitmap live_bytes, gimple *stmt, ao_ref *ref)
+{
+  ao_ref write;
+  if (!initialize_ao_ref_for_dse (stmt, &write))
+    return;
+
+  /* Verify we have the same base memory address, the write
+     has a known size and overlaps with REF.  */
+  if (valid_ao_ref_for_dse (&write)
+      && write.base == ref->base
+      && write.size == write.max_size
+      && ((write.offset < ref->offset
+	   && write.offset + write.size > ref->offset)
+	  || (write.offset >= ref->offset
+	      && write.offset < ref->offset + ref->size)))
+    {
+      normalize_ref (&write, ref);
+      bitmap_clear_range (live_bytes,
+			  (write.offset - ref->offset) / BITS_PER_UNIT,
+			  write.size / BITS_PER_UNIT);
+    }
+}
+
+/* REF is a memory write.  Extract relevant information from it and
+   initialize the LIVE_BYTES bitmap.  If successful, return TRUE.
+   Otherwise return FALSE.  */
+
+static bool
+setup_live_bytes_from_ref (ao_ref *ref, sbitmap live_bytes)
+{
+  if (valid_ao_ref_for_dse (ref)
+      && (ref->size / BITS_PER_UNIT
+	  <= PARAM_VALUE (PARAM_DSE_MAX_OBJECT_SIZE)))
+    {
+      bitmap_clear (live_bytes);
+      bitmap_set_range (live_bytes, 0, ref->size / BITS_PER_UNIT);
+      return true;
+    }
+  return false;
+}
+
+/* Compute the number of elements that we can trim from the head and
+   tail of ORIG resulting in a bitmap that is a superset of LIVE.
+
+   Store the number of elements trimmed from the head and tail in
+   TRIM_HEAD and TRIM_TAIL.  */
+
+static void
+compute_trims (ao_ref *ref, sbitmap live, int *trim_head, int *trim_tail)
+{
+  /* We use sbitmaps biased such that ref->offset is bit zero and the bitmap
+     extends through ref->size.  So we know that in the original bitmap
+     bits 0..ref->size were true.  We don't actually need the bitmap, just
+     the REF to compute the trims.  */
+
+  /* Now identify how much, if any of the tail we can chop off.  */
+  *trim_tail = 0;
+  int last_orig = (ref->size / BITS_PER_UNIT) - 1;
+  int last_live = bitmap_last_set_bit (live);
+  *trim_tail = (last_orig - last_live) & ~0x1;
+
+  /* Identify how much, if any of the head we can chop off.  */
+  int first_orig = 0;
+  int first_live = bitmap_first_set_bit (live);
+  *trim_head = (first_live - first_orig) & ~0x1;
+}
+
+/* STMT initializes an object from COMPLEX_CST where one or more of the
+   bytes written may be dead stores.  REF is a representation of the
+   memory written.  LIVE is the bitmap of stores that are actually live.
+
+   Attempt to rewrite STMT so that only the real or imaginary part of
+   the object is actually stored.  */
+
+static void
+maybe_trim_complex_store (ao_ref *ref, sbitmap live, gimple *stmt)
+{
+  int trim_head, trim_tail;
+  compute_trims (ref, live, &trim_head, &trim_tail);
+
+  /* The amount of data trimmed from the head or tail must be at
+     least half the size of the object to ensure we're trimming
+     the entire real or imaginary half.  By writing things this
+     way we avoid more O(n) bitmap operations.  */
+  if (trim_tail * 2 >= ref->size / BITS_PER_UNIT)
+    {
+      /* TREE_REALPART is live */
+      tree x = TREE_REALPART (gimple_assign_rhs1 (stmt));
+      tree y = gimple_assign_lhs (stmt);
+      y = build1 (REALPART_EXPR, TREE_TYPE (x), y);
+      gimple_assign_set_lhs (stmt, y);
+      gimple_assign_set_rhs1 (stmt, x);
+    }
+  else if (trim_head * 2 >= ref->size / BITS_PER_UNIT)
+    {
+      /* TREE_IMAGPART is live */
+      tree x = TREE_IMAGPART (gimple_assign_rhs1 (stmt));
+      tree y = gimple_assign_lhs (stmt);
+      y = build1 (IMAGPART_EXPR, TREE_TYPE (x), y);
+      gimple_assign_set_lhs (stmt, y);
+      gimple_assign_set_rhs1 (stmt, x);
+    }
+
+  /* Other cases indicate parts of both the real and imag subobjects
+     are live.  We do not try to optimize those cases.  */
+}
+
+/* STMT is a memory write where one or more bytes written are dead
+   stores.  ORIG is the bitmap of bytes stored by STMT.  LIVE is the
+   bitmap of stores that are actually live.
+
+   Attempt to rewrite STMT so that it writes fewer memory locations.  Right
+   now we only support trimming at the start or end of the memory region.
+   It's not clear how much there is to be gained by trimming from the middle
+   of the region.  */
+
+static void
+maybe_trim_partially_dead_store (ao_ref *ref, sbitmap live, gimple *stmt)
+{
+  if (is_gimple_assign (stmt))
+    {
+      switch (gimple_assign_rhs_code (stmt))
+	{
+	case COMPLEX_CST:
+	  maybe_trim_complex_store (ref, live, stmt);
+	  break;
+	default:
+	  break;
+	}
+    }
+}
 
 /* A helper of dse_optimize_stmt.
    Given a GIMPLE_ASSIGN in STMT that writes to REF, find a candidate
    statement *USE_STMT that may prove STMT to be dead.
    Return TRUE if the above conditions are met, otherwise FALSE.  */
 
-static bool
-dse_possible_dead_store_p (ao_ref *ref, gimple *stmt, gimple **use_stmt)
+static dse_store_status
+dse_classify_store (ao_ref *ref, gimple *stmt, gimple **use_stmt,
+		    bool byte_tracking_enabled, sbitmap live_bytes)
 {
   gimple *temp;
   unsigned cnt = 0;
@@ -97,7 +326,7 @@  dse_possible_dead_store_p (ao_ref *ref, gimple *stmt, gimple **use_stmt)
       /* Limit stmt walking to be linear in the number of possibly
          dead stores.  */
       if (++cnt > 256)
-	return false;
+	return DSE_STORE_LIVE;
 
       if (gimple_code (temp) == GIMPLE_PHI)
 	defvar = PHI_RESULT (temp);
@@ -135,7 +364,7 @@  dse_possible_dead_store_p (ao_ref *ref, gimple *stmt, gimple **use_stmt)
 		  fail = true;
 		  BREAK_FROM_IMM_USE_STMT (ui);
 		}
-	      /* Do not consider the PHI as use if it dominates the 
+	      /* Do not consider the PHI as use if it dominates the
 	         stmt defining the virtual operand we are processing,
 		 we have processed it already in this case.  */
 	      if (gimple_bb (defvar_def) != gimple_bb (use_stmt)
@@ -164,7 +393,13 @@  dse_possible_dead_store_p (ao_ref *ref, gimple *stmt, gimple **use_stmt)
 	}
 
       if (fail)
-	return false;
+	{
+	  /* STMT might be partially dead and we may be able to reduce
+	     how many memory locations it stores into.  */
+	  if (byte_tracking_enabled && !gimple_clobber_p (stmt))
+	    return DSE_STORE_MAYBE_PARTIAL_DEAD;
+	  return DSE_STORE_LIVE;
+	}
 
       /* If we didn't find any definition this means the store is dead
          if it isn't a store to global reachable memory.  In this case
@@ -172,20 +407,100 @@  dse_possible_dead_store_p (ao_ref *ref, gimple *stmt, gimple **use_stmt)
       if (!temp)
 	{
 	  if (ref_may_alias_global_p (ref))
-	    return false;
+	    return DSE_STORE_LIVE;
 
 	  temp = stmt;
 	  break;
 	}
+
+      if (byte_tracking_enabled && temp)
+	clear_bytes_written_by (live_bytes, temp, ref);
     }
-  /* Continue walking until we reach a kill.  */
-  while (!stmt_kills_ref_p (temp, ref));
+  /* Continue walking until we reach a full kill as a single statement
+     or there are no more live bytes.  */
+  while (!stmt_kills_ref_p (temp, ref)
+	 && !(byte_tracking_enabled && bitmap_empty_p (live_bytes)));
 
   *use_stmt = temp;
+  return DSE_STORE_DEAD;
+}
+
+
+class dse_dom_walker : public dom_walker
+{
+public:
+  dse_dom_walker (cdi_direction direction) : dom_walker (direction)
+  { m_live_bytes = sbitmap_alloc (PARAM_VALUE (PARAM_DSE_MAX_OBJECT_SIZE));m_byte_tracking_enabled = false; }
+
+  ~dse_dom_walker () { sbitmap_free (m_live_bytes); }
+
+  virtual edge before_dom_children (basic_block);
 
-  return true;
+private:
+  sbitmap m_live_bytes;
+  bool m_byte_tracking_enabled;
+  void dse_optimize_stmt (gimple_stmt_iterator *);
+};
+
+/* Delete a dead store STMT, which is mem* call of some kind.  */
+static void
+delete_dead_call (gimple *stmt)
+{
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "  Deleted dead call: ");
+      print_gimple_stmt (dump_file, stmt, dump_flags, 0);
+      fprintf (dump_file, "\n");
+    }
+
+  tree lhs = gimple_call_lhs (stmt);
+  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+  if (lhs)
+    {
+      tree ptr = gimple_call_arg (stmt, 0);
+      gimple *new_stmt = gimple_build_assign (lhs, ptr);
+      unlink_stmt_vdef (stmt);
+      if (gsi_replace (&gsi, new_stmt, true))
+        bitmap_set_bit (need_eh_cleanup, gimple_bb (stmt)->index);
+    }
+  else
+    {
+      /* Then we need to fix the operand of the consuming stmt.  */
+      unlink_stmt_vdef (stmt);
+
+      /* Remove the dead store.  */
+      if (gsi_remove (&gsi, true))
+	bitmap_set_bit (need_eh_cleanup, gimple_bb (stmt)->index);
+      release_defs (stmt);
+    }
 }
 
+/* Delete a dead store STMT, which is a gimple assignment. */
+
+static void
+delete_dead_assignment (gimple *stmt)
+{
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "  Deleted dead store: ");
+      print_gimple_stmt (dump_file, stmt, dump_flags, 0);
+      fprintf (dump_file, "\n");
+    }
+
+  /* Then we need to fix the operand of the consuming stmt.  */
+  unlink_stmt_vdef (stmt);
+
+  /* Remove the dead store.  */
+  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+  basic_block bb = gimple_bb (stmt);
+  if (gsi_remove (&gsi, true))
+    bitmap_set_bit (need_eh_cleanup, bb->index);
+
+  /* And release any SSA_NAMEs set in this statement back to the
+     SSA_NAME manager.  */
+  release_defs (stmt);
+}
 
 /* Attempt to eliminate dead stores in the statement referenced by BSI.
 
@@ -198,8 +513,8 @@  dse_possible_dead_store_p (ao_ref *ref, gimple *stmt, gimple **use_stmt)
    is used precisely once by a later store to the same location which
    post dominates the first store, then the first store is dead.  */
 
-static void
-dse_optimize_stmt (gimple_stmt_iterator *gsi)
+void
+dse_dom_walker::dse_optimize_stmt (gimple_stmt_iterator *gsi)
 {
   gimple *stmt = gsi_stmt (*gsi);
 
@@ -214,6 +529,10 @@  dse_optimize_stmt (gimple_stmt_iterator *gsi)
 	  || TREE_CODE (gimple_assign_lhs (stmt)) != MEM_REF))
     return;
 
+  ao_ref ref;
+  if (!initialize_ao_ref_for_dse (stmt, &ref))
+    return;
+
   /* We know we have virtual definitions.  We can handle assignments and
      some builtin calls.  */
   if (gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
@@ -225,42 +544,26 @@  dse_optimize_stmt (gimple_stmt_iterator *gsi)
 	  case BUILT_IN_MEMSET:
 	    {
 	      gimple *use_stmt;
-	      ao_ref ref;
-	      tree size = NULL_TREE;
-	      if (gimple_call_num_args (stmt) == 3)
-		size = gimple_call_arg (stmt, 2);
-	      tree ptr = gimple_call_arg (stmt, 0);
-	      ao_ref_init_from_ptr_and_size (&ref, ptr, size);
-	      if (!dse_possible_dead_store_p (&ref, stmt, &use_stmt))
+	      enum dse_store_status store_status;
+	      m_byte_tracking_enabled
+		= setup_live_bytes_from_ref (&ref, m_live_bytes);
+	      store_status = dse_classify_store (&ref, stmt, &use_stmt,
+						 m_byte_tracking_enabled,
+						 m_live_bytes);
+	      if (store_status == DSE_STORE_LIVE)
 		return;
 
-	      if (dump_file && (dump_flags & TDF_DETAILS))
-		{
-		  fprintf (dump_file, "  Deleted dead call: ");
-		  print_gimple_stmt (dump_file, gsi_stmt (*gsi), dump_flags, 0);
-		  fprintf (dump_file, "\n");
-		}
-
-	      tree lhs = gimple_call_lhs (stmt);
-	      if (lhs)
+	      if (store_status == DSE_STORE_MAYBE_PARTIAL_DEAD)
 		{
-		  gimple *new_stmt = gimple_build_assign (lhs, ptr);
-		  unlink_stmt_vdef (stmt);
-		  if (gsi_replace (gsi, new_stmt, true))
-		    bitmap_set_bit (need_eh_cleanup, gimple_bb (stmt)->index);
+		  maybe_trim_partially_dead_store (&ref, m_live_bytes, stmt);
+		  return;
 		}
-	      else
-		{
-		  /* Then we need to fix the operand of the consuming stmt.  */
-		  unlink_stmt_vdef (stmt);
 
-		  /* Remove the dead store.  */
-		  if (gsi_remove (gsi, true))
-		    bitmap_set_bit (need_eh_cleanup, gimple_bb (stmt)->index);
-		  release_defs (stmt);
-		}
-	      break;
+	      if (store_status == DSE_STORE_DEAD)
+		delete_dead_call (stmt);
+	      return;
 	    }
+
 	  default:
 	    return;
 	}
@@ -276,10 +579,20 @@  dse_optimize_stmt (gimple_stmt_iterator *gsi)
 	use_stmt = stmt;
       else
 	{
-	  ao_ref ref;
-	  ao_ref_init (&ref, gimple_assign_lhs (stmt));
-  	  if (!dse_possible_dead_store_p (&ref, stmt, &use_stmt))
+	  m_byte_tracking_enabled
+	    = setup_live_bytes_from_ref (&ref, m_live_bytes);
+	  enum dse_store_status store_status;
+	  store_status = dse_classify_store (&ref, stmt, &use_stmt,
+					     m_byte_tracking_enabled,
+					     m_live_bytes);
+	  if (store_status == DSE_STORE_LIVE)
 	    return;
+
+	  if (store_status == DSE_STORE_MAYBE_PARTIAL_DEAD)
+	    {
+	      maybe_trim_partially_dead_store (&ref, m_live_bytes, stmt);
+	      return;
+	    }
 	}
 
       /* Now we know that use_stmt kills the LHS of stmt.  */
@@ -290,35 +603,10 @@  dse_optimize_stmt (gimple_stmt_iterator *gsi)
 	  && !gimple_clobber_p (use_stmt))
 	return;
 
-      if (dump_file && (dump_flags & TDF_DETAILS))
-	{
-	  fprintf (dump_file, "  Deleted dead store: ");
-	  print_gimple_stmt (dump_file, gsi_stmt (*gsi), dump_flags, 0);
-	  fprintf (dump_file, "\n");
-	}
-
-      /* Then we need to fix the operand of the consuming stmt.  */
-      unlink_stmt_vdef (stmt);
-
-      /* Remove the dead store.  */
-      basic_block bb = gimple_bb (stmt);
-      if (gsi_remove (gsi, true))
-	bitmap_set_bit (need_eh_cleanup, bb->index);
-
-      /* And release any SSA_NAMEs set in this statement back to the
-	 SSA_NAME manager.  */
-      release_defs (stmt);
+      delete_dead_assignment (stmt);
     }
 }
 
-class dse_dom_walker : public dom_walker
-{
-public:
-  dse_dom_walker (cdi_direction direction) : dom_walker (direction) {}
-
-  virtual edge before_dom_children (basic_block);
-};
-
 edge
 dse_dom_walker::before_dom_children (basic_block bb)
 {
@@ -391,7 +679,7 @@  pass_dse::execute (function *fun)
     }
 
   BITMAP_FREE (need_eh_cleanup);
-    
+
   /* For now, just wipe the post-dominator information.  */
   free_dominance_info (CDI_POST_DOMINATORS);
   return 0;