Patchwork New attempt: subword-based DCE, PR42575

login
register
mail settings
Submitter Bernd Schmidt
Date July 28, 2010, 7 p.m.
Message ID <4C507E3C.7060900@codesourcery.com>
Download mbox | patch
Permalink /patch/60169/
State New
Headers show

Comments

Bernd Schmidt - July 28, 2010, 7 p.m.
PR42575 is an example where we're creating a dead store into a subword
of a multiword pseudo register, which then messes up register
allocation.  I tried to fix this earlier, and it was another issue where
I feel the patch review process has so far failed.

The patch below replaces the overengineered, broken and disabled
byte-level DCE with a simpler word-level DCE, run from lower-subreg.
Previously I made the mistake of letting myself get influenced by
worries about compile-time performance which are IMO completely
overblown.  There are a number of reasons why compile-time is a
non-issue in this case, never mind the general observation that
compilation is embarrassingly parallel in practice:

* the code is guarded by a test in lower-subreg which verifies that
multiword pseudos exist in the function; that alone makes the
compile-time impact exactly nil for the majority of code
* it's replacing an existing, more expensive pass which was approved
(although later disabled)

Because I took this concern too seriously, the previous patch was a weak
basic-block local DCE that was fast and fixed the testcase, but had no
other value.  This time, I've simply implemented it in the way I think
it should be done, taking the existing byte-level DCE and reducing it to
a fairly simple pass that only operates on two-word pseudos.

This triggers in several testcases other than the one in PR42575, for
example in crafty or the XFS filesystem (tested with an ARM compiler).
This demonstrates that it fixes a real-world, general problem that can
occur for more than one reason.

 ./spec2k/186.crafty/run/00000004/attacks.s
-  0 .text         00000b28  00000000  00000000  00000034  2**2
+  0 .text         00000a40  00000000  00000000  00000034  2**2
 ./spec2k/186.crafty/run/00000004/swap.s
-  0 .text         000005a0  00000000  00000000  00000034  2**2
+  0 .text         00000500  00000000  00000000  00000034  2**2

That seems like a large difference - I've looked at RTL dumps, and from
what I've seen, what it's doing appears sane.

Last time, Eric posted counter-patches which tried hackish ways of
tweaking the generated RTL so that the problem is hidden for the PR42575
testcase; both of these made code generation worse on average and failed
to fix the more general issue of dead stores into subwords.  One of them
contained a useful idea, which is to allow lower-subreg to split regs
even if they are stored in full in a non-decomposable context.  I've
implemented this in a more general way, but it can't be used just yet
because a number of x86 testcases require that certain DImode regs are
_not_ split (e.g. because they are copied from an MMX hard register).
It'll require somewhat more work to add the necessary cost stuff to
lower-subreg.

Hence, I think this new DCE patch should go in instead.  Bootstrapped
and regression tested on i686-linux.  Ok?


Bernd
PR rtl-optimization/42575
	* dce.c (word_dce_process_block): Renamed from byte_dce_process_block.
	Argument AU removed.  All callers changed.  Ignore artificial refs.
	Use return value of df_word_lr_simulate_defs to decide whether an insn
	is necessary.
	(fast_dce): Rename arg to WORD_LEVEL.
	(run_word_dce): Renamed from rest_of_handle_fast_byte_dce.  No longer
	static.
	(pass_fast_rtl_byte_dce): Delete.
	* dce.h (run_word_dce): Declare.
	* df-core.c (df_print_word_regset): Renamed from df_print_byteregset.
	All callers changed.  Simplify code to only deal with two-word regs.
	* df.h (DF_WORD_LR): Renamed from DF_BYTE_LR.
	(DF_WORD_LR_BB_INFO): Renamed from DF_BYTE_LR_BB_INFO.
	(DF_WORD_LR_IN): Renamed from DF_BYTE_LR_IN.
	(DF_WORD_LR_OUT): Renamed from DF_BYTE_LR_OUT.
	(struct df_word_lr_bb_info): Renamed from df_byte_lr_bb_info.
	(df_word_lr_mark_ref): Declare.
	(df_word_lr_add_problem, df_word_lr_mark_ref, df_word_lr_simulate_defs,
	df_word_lr_simulate_uses): Declare or rename from byte variants.
	(df_byte_lr_simulate_artificial_refs_at_top,
	df_byte_lr_simulate_artificial_refs_at_end, df_byte_lr_get_regno_start,
	df_byte_lr_get_regno_len): Delete declarations.
	(df_word_lr_get_bb_info): Rename from df_byte_lr_get_bb_info.
	* df-problems.c (df_word_lr_problem_data): Renamed from
	df_byte_lr_problem_data, all members deleted except for
	WORD_LR_BITMAPS, which is renamed from BYTE_LR_BITMAPS.  Uses changed.
	(df_word_lr_expand_bitmap, df_byte_lr_simulate_artificial_refs_at_top,
	df_byte_lr_simulate_artificial_refs_at_end, df_byte_lr_get_regno_start,
	df_byte_lr_get_regno_len, df_byte_lr_check_regs,
	df_byte_lr_confluence_0): Delete functions.
	(df_word_lr_free_bb_info): Renamed from df_byte_lr_free_bb_info; all
	callers changed.
	(df_word_lr_alloc): Renamed from df_byte_lr_alloc; all callers changed.
	Don't initialize members that were deleted, don't try to discover data
	about registers.  Ignore hard regs.
	(df_word_lr_reset): Renamed from df_byte_lr_reset; all callers changed.
	(df_word_lr_mark_ref): New function.
	(df_word_lr_bb_local_compute): Renamed from
	df_byte_bb_lr_local_compute; all callers changed.  Use
	df_word_lr_mark_ref.  Assert that artificial refs don't include
	pseudos.  Ignore hard registers.
	(df_word_lr_local_compute): Renamed from df_byte_lr_local_compute.
	Assert that exit block uses don't contain pseudos.
	(df_word_lr_init): Renamed from df_byte_lr_init; all callers changed.
	(df_word_lr_confluence_n): Renamed from df_byte_lr_confluence_n; all
	callers changed.  Ignore hard regs.
	(df_word_lr_transfer_function): Renamed from
	df_byte_lr_transfer_function; all callers changed.
	(df_word_lr_free): Renamed from df_byte_lr_free; all callers changed.
	(df_word_lr_top_dump): Renamed from df_byte_lr_top_dump; all callers
	changed.
	(df_word_lr_bottom_dump): Renamed from df_byte_lr_bottom_dump; all
	callers changed.
	(problem_WORD_LR): Renamed from problem_BYTE_LR; uses changed;
	confluence operator 0 set to NULL.
	(df_word_lr_add_problem): Renamed from df_byte_lr_add_problem; all
	callers changed.
	(df_word_lr_simulate_defs): Renamed from df_byte_lr_simulate_defs.
	Return bool, true if bitmap changed or insn otherwise necessary.
	All callers changed.  Simplify using df_word_lr_mark_ref.
	(df_word_lr_simulate_uses): Renamed from df_byte_lr_simulate_uses;
	all callers changed.  Simplify using df_word_lr_mark_ref.
	* lower-subreg.c: Include "dce.h"
	(decompose_multiword_subregs): Call run_word_dce if df available.
	* Makefile.in (lower-subreg.o): Adjust dependencies.
	* timevar.def (TV_DF_WORD_LR): Renamed from TV_DF_BYTE_LR.
Paolo Bonzini - July 29, 2010, 12:24 a.m.
> Hence, I think this new DCE patch should go in instead.  Bootstrapped
> and regression tested on i686-linux.  Ok?

Sure!  (I actually could not approve the two lines in lower-subreg.c, 
but hey).

Can you please remove also df-byte-scan.c and the corresponding 
definitions in df.h?  They should be unused after your patch.

Thanks very much for this.

Paolo
Eric Botcazou - July 29, 2010, 8:14 a.m.
> Previously I made the mistake of letting myself get influenced by
> worries about compile-time performance which are IMO completely
> overblown.  There are a number of reasons why compile-time is a
> non-issue in this case, never mind the general observation that
> compilation is embarrassingly parallel in practice:
>
> * the code is guarded by a test in lower-subreg which verifies that
> multiword pseudos exist in the function; that alone makes the
> compile-time impact exactly nil for the majority of code
> * it's replacing an existing, more expensive pass which was approved
> (although later disabled)

A figure is worth a thousand words when it comes to compilation time.  Did you 
measure the impact on a bootstrap of the core compiler on x86 for example?

> This triggers in several testcases other than the one in PR42575, for
> example in crafty or the XFS filesystem (tested with an ARM compiler).
> This demonstrates that it fixes a real-world, general problem that can
> occur for more than one reason.

What reasons?  There are 2 ways to "fix" problems: analyzing the causes and 
eliminating them, or treating the symptoms.  I'm fond of the former approach, 
you're apparently more fond of the latter.

> Last time, Eric posted counter-patches which tried hackish ways of
> tweaking the generated RTL so that the problem is hidden for the PR42575
> testcase; both of these made code generation worse on average and failed
> to fix the more general issue of dead stores into subwords.

Well, I first asked you to investigate why lower-subreg + DCE couldn't achieve 
what you were looking for and whether this could be fixed.  You apparently 
didn't try very hard so I had to; early patches may indeed resemble hacks.

We already run 2 lower-subreg passes and 2+ DCE passes so we should be able to 
eliminate dead assignments to subwords of multiword pseudo-registers with a 
few efforts.  Instead we now have a new DCE pass and we still don't know what 
it would have taken to enhance existing lower-subreg and DCE passes.
Bernd Schmidt - July 29, 2010, 12:40 p.m.
On 07/29/2010 02:24 AM, Paolo Bonzini wrote:
>> Hence, I think this new DCE patch should go in instead.  Bootstrapped
>> and regression tested on i686-linux.  Ok?
> 
> Sure!  (I actually could not approve the two lines in lower-subreg.c,
> but hey).
> 
> Can you please remove also df-byte-scan.c and the corresponding
> definitions in df.h?  They should be unused after your patch.

Thanks!  I've made this change, bootstrapped again, ran 186.crafty as an
extra test, and committed it.


Bernd

Patch

Index: gcc/dce.c
===================================================================
--- gcc.orig/dce.c
+++ gcc/dce.c
@@ -767,12 +767,11 @@  struct rtl_opt_pass pass_ud_rtl_dce =
    artificial uses. */
 
 static bool
-byte_dce_process_block (basic_block bb, bool redo_out, bitmap au)
+word_dce_process_block (basic_block bb, bool redo_out)
 {
   bitmap local_live = BITMAP_ALLOC (&dce_tmp_bitmap_obstack);
   rtx insn;
   bool block_changed;
-  df_ref *def_rec;
 
   if (redo_out)
     {
@@ -781,8 +780,8 @@  byte_dce_process_block (basic_block bb, 
 	 set.  */
       edge e;
       edge_iterator ei;
-      df_confluence_function_n con_fun_n = df_byte_lr->problem->con_fun_n;
-      bitmap_clear (DF_BYTE_LR_OUT (bb));
+      df_confluence_function_n con_fun_n = df_word_lr->problem->con_fun_n;
+      bitmap_clear (DF_WORD_LR_OUT (bb));
       FOR_EACH_EDGE (e, ei, bb->succs)
 	(*con_fun_n) (e);
     }
@@ -790,76 +789,38 @@  byte_dce_process_block (basic_block bb, 
   if (dump_file)
     {
       fprintf (dump_file, "processing block %d live out = ", bb->index);
-      df_print_byte_regset (dump_file, DF_BYTE_LR_OUT (bb));
+      df_print_word_regset (dump_file, DF_WORD_LR_OUT (bb));
     }
 
-  bitmap_copy (local_live, DF_BYTE_LR_OUT (bb));
-
-  df_byte_lr_simulate_artificial_refs_at_end (bb, local_live);
+  bitmap_copy (local_live, DF_WORD_LR_OUT (bb));
 
   FOR_BB_INSNS_REVERSE (bb, insn)
     if (INSN_P (insn))
       {
-	/* The insn is needed if there is someone who uses the output.  */
-	for (def_rec = DF_INSN_DEFS (insn); *def_rec; def_rec++)
-	  {
-	    df_ref def = *def_rec;
-	    unsigned int last;
-	    unsigned int dregno = DF_REF_REGNO (def);
-	    unsigned int start = df_byte_lr_get_regno_start (dregno);
-	    unsigned int len = df_byte_lr_get_regno_len (dregno);
-
-	    unsigned int sb;
-	    unsigned int lb;
-	    /* This is one of the only places where DF_MM_MAY should
-	       be used for defs.  Need to make sure that we are
-	       checking for all of the bits that may be used.  */
-
-	    if (!df_compute_accessed_bytes (def, DF_MM_MAY, &sb, &lb))
-	      {
-		start += sb;
-		len = lb - sb;
-	      }
-
-	    if (bitmap_bit_p (au, dregno))
-	      {
-		mark_insn (insn, true);
-		goto quickexit;
-	      }
-
-	    last = start + len;
-	    while (start < last)
-	      if (bitmap_bit_p (local_live, start++))
-		{
-		  mark_insn (insn, true);
-		  goto quickexit;
-		}
-	  }
-
-      quickexit:
-
+	bool any_changed;
 	/* No matter if the instruction is needed or not, we remove
 	   any regno in the defs from the live set.  */
-	df_byte_lr_simulate_defs (insn, local_live);
+	any_changed = df_word_lr_simulate_defs (insn, local_live);
+	if (any_changed)
+	  mark_insn (insn, true);
 
 	/* On the other hand, we do not allow the dead uses to set
 	   anything in local_live.  */
 	if (marked_insn_p (insn))
-	  df_byte_lr_simulate_uses (insn, local_live);
+	  df_word_lr_simulate_uses (insn, local_live);
 
 	if (dump_file)
 	  {
 	    fprintf (dump_file, "finished processing insn %d live out = ",
 		     INSN_UID (insn));
-	    df_print_byte_regset (dump_file, local_live);
+	    df_print_word_regset (dump_file, local_live);
 	  }
       }
 
-  df_byte_lr_simulate_artificial_refs_at_top (bb, local_live);
-
-  block_changed = !bitmap_equal_p (local_live, DF_BYTE_LR_IN (bb));
+  block_changed = !bitmap_equal_p (local_live, DF_WORD_LR_IN (bb));
   if (block_changed)
-    bitmap_copy (DF_BYTE_LR_IN (bb), local_live);
+    bitmap_copy (DF_WORD_LR_IN (bb), local_live);
+
   BITMAP_FREE (local_live);
   return block_changed;
 }
@@ -938,12 +899,12 @@  dce_process_block (basic_block bb, bool 
 }
 
 
-/* Perform fast DCE once initialization is done.  If BYTE_LEVEL is
-   true, use the byte level dce, otherwise do it at the pseudo
+/* Perform fast DCE once initialization is done.  If WORD_LEVEL is
+   true, use the word level dce, otherwise do it at the pseudo
    level.  */
 
 static void
-fast_dce (bool byte_level)
+fast_dce (bool word_level)
 {
   int *postorder = df_get_postorder (DF_BACKWARD);
   int n_blocks = df_get_n_blocks (DF_BACKWARD);
@@ -985,10 +946,9 @@  fast_dce (bool byte_level)
 	      continue;
 	    }
 
-	  if (byte_level)
+	  if (word_level)
 	    local_changed
-	      = byte_dce_process_block (bb, bitmap_bit_p (redo_out, index),
-					  bb_has_eh_pred (bb) ? au_eh : au);
+	      = word_dce_process_block (bb, bitmap_bit_p (redo_out, index));
 	  else
 	    local_changed
 	      = dce_process_block (bb, bitmap_bit_p (redo_out, index),
@@ -1028,8 +988,8 @@  fast_dce (bool byte_level)
 	     to redo the dataflow equations for the blocks that had a
 	     change at the top of the block.  Then we need to redo the
 	     iteration.  */
-	  if (byte_level)
-	    df_analyze_problem (df_byte_lr, all_blocks, postorder, n_blocks);
+	  if (word_level)
+	    df_analyze_problem (df_word_lr, all_blocks, postorder, n_blocks);
 	  else
 	    df_analyze_problem (df_lr, all_blocks, postorder, n_blocks);
 
@@ -1062,14 +1022,15 @@  rest_of_handle_fast_dce (void)
 
 /* Fast byte level DCE.  */
 
-static unsigned int
-rest_of_handle_fast_byte_dce (void)
+void
+run_word_dce (void)
 {
-  df_byte_lr_add_problem ();
+  timevar_push (TV_DCE);
+  df_word_lr_add_problem ();
   init_dce (true);
   fast_dce (true);
   fini_dce (true);
-  return 0;
+  timevar_pop (TV_DCE);
 }
 
 
@@ -1139,24 +1100,3 @@  struct rtl_opt_pass pass_fast_rtl_dce =
   TODO_ggc_collect                      /* todo_flags_finish */
  }
 };
-
-struct rtl_opt_pass pass_fast_rtl_byte_dce =
-{
- {
-  RTL_PASS,
-  "byte-dce",                           /* name */
-  gate_fast_dce,                        /* gate */
-  rest_of_handle_fast_byte_dce,         /* execute */
-  NULL,                                 /* sub */
-  NULL,                                 /* next */
-  0,                                    /* static_pass_number */
-  TV_DCE,                               /* tv_id */
-  0,                                    /* properties_required */
-  0,                                    /* properties_provided */
-  0,                                    /* properties_destroyed */
-  0,                                    /* todo_flags_start */
-  TODO_dump_func |
-  TODO_df_finish | TODO_verify_rtl_sharing |
-  TODO_ggc_collect                      /* todo_flags_finish */
- }
-};
Index: gcc/dce.h
===================================================================
--- gcc.orig/dce.h
+++ gcc/dce.h
@@ -20,6 +20,7 @@  along with GCC; see the file COPYING3.  
 #ifndef GCC_DCE_H
 #define GCC_DCE_H
 
+extern void run_word_dce (void);
 extern void run_fast_dce (void);
 extern void run_fast_df_dce (void);
 
Index: gcc/df-core.c
===================================================================
--- gcc.orig/df-core.c
+++ gcc/df-core.c
@@ -1919,58 +1919,33 @@  df_print_regset (FILE *file, bitmap r)
    debugging dump.  */
 
 void
-df_print_byte_regset (FILE *file, bitmap r)
+df_print_word_regset (FILE *file, bitmap r)
 {
   unsigned int max_reg = max_reg_num ();
-  bitmap_iterator bi;
 
   if (r == NULL)
     fputs (" (nil)", file);
   else
     {
       unsigned int i;
-      for (i = 0; i < max_reg; i++)
+      for (i = FIRST_PSEUDO_REGISTER; i < max_reg; i++)
 	{
-	  unsigned int first = df_byte_lr_get_regno_start (i);
-	  unsigned int len = df_byte_lr_get_regno_len (i);
-
-	  if (len > 1)
+	  bool found = (bitmap_bit_p (r, 2 * i)
+			|| bitmap_bit_p (r, 2 * i + 1));
+	  if (found)
 	    {
-	      bool found = false;
-	      unsigned int j;
-
-	      EXECUTE_IF_SET_IN_BITMAP (r, first, j, bi)
-		{
-		  found = j < first + len;
-		  break;
-		}
-	      if (found)
-		{
-		  const char * sep = "";
-		  fprintf (file, " %d", i);
-		  if (i < FIRST_PSEUDO_REGISTER)
-		    fprintf (file, " [%s]", reg_names[i]);
-		  fprintf (file, "(");
-		  EXECUTE_IF_SET_IN_BITMAP (r, first, j, bi)
-		    {
-		      if (j > first + len - 1)
-			break;
-		      fprintf (file, "%s%d", sep, j-first);
-		      sep = ", ";
-		    }
-		  fprintf (file, ")");
-		}
+	      int word;
+	      const char * sep = "";
+	      fprintf (file, " %d", i);
+	      fprintf (file, "(");
+	      for (word = 0; word < 2; word++)
+		if (bitmap_bit_p (r, 2 * i + word))
+		  {
+		    fprintf (file, "%s%d", sep, word);
+		    sep = ", ";
+		  }
+	      fprintf (file, ")");
 	    }
-	  else
-	    {
-	      if (bitmap_bit_p (r, first))
-		{
-		  fprintf (file, " %d", i);
-		  if (i < FIRST_PSEUDO_REGISTER)
-		    fprintf (file, " [%s]", reg_names[i]);
-		}
-	    }
-
 	}
     }
   fprintf (file, "\n");
Index: gcc/df.h
===================================================================
--- gcc.orig/df.h
+++ gcc/df.h
@@ -52,7 +52,7 @@  union df_ref_d;
 #define DF_LIVE    2      /* Live Registers & Uninitialized Registers */
 #define DF_RD      3      /* Reaching Defs. */
 #define DF_CHAIN   4      /* Def-Use and/or Use-Def Chains. */
-#define DF_BYTE_LR 5      /* Subreg tracking lr.  */
+#define DF_WORD_LR 5      /* Subreg tracking lr.  */
 #define DF_NOTE    6      /* REG_DEF and REG_UNUSED notes. */
 #define DF_MD      7      /* Multiple Definitions. */
 
@@ -624,7 +624,7 @@  struct df_d
 #define DF_RD_BB_INFO(BB) (df_rd_get_bb_info((BB)->index))
 #define DF_LR_BB_INFO(BB) (df_lr_get_bb_info((BB)->index))
 #define DF_LIVE_BB_INFO(BB) (df_live_get_bb_info((BB)->index))
-#define DF_BYTE_LR_BB_INFO(BB) (df_byte_lr_get_bb_info((BB)->index))
+#define DF_WORD_LR_BB_INFO(BB) (df_word_lr_get_bb_info((BB)->index))
 #define DF_MD_BB_INFO(BB) (df_md_get_bb_info((BB)->index))
 
 /* Most transformations that wish to use live register analysis will
@@ -641,8 +641,8 @@  struct df_d
 /* These macros are used by passes that are not tolerant of
    uninitialized variables.  This intolerance should eventually
    be fixed.  */
-#define DF_BYTE_LR_IN(BB) (&DF_BYTE_LR_BB_INFO(BB)->in)
-#define DF_BYTE_LR_OUT(BB) (&DF_BYTE_LR_BB_INFO(BB)->out)
+#define DF_WORD_LR_IN(BB) (&DF_WORD_LR_BB_INFO(BB)->in)
+#define DF_WORD_LR_OUT(BB) (&DF_WORD_LR_BB_INFO(BB)->out)
 
 /* Macros to access the elements within the ref structure.  */
 
@@ -859,9 +859,11 @@  struct df_live_bb_info
 
 
 /* Live registers, a backwards dataflow problem.  These bitmaps are
-indexed by the df_byte_lr_offset array which is indexed by pseudo.  */
+   indexed by 2 * regno for each pseudo and have two entries for each
+   pseudo.  Only pseudos that have a size of 2 * UNITS_PER_WORD are
+   meaningfully tracked.  */
 
-struct df_byte_lr_bb_info
+struct df_word_lr_bb_info
 {
   /* Local sets to describe the basic blocks.  */
   bitmap_head def;   /* The set of registers set in this block
@@ -883,7 +885,7 @@  extern struct df_d *df;
 #define df_lr      (df->problems_by_index[DF_LR])
 #define df_live    (df->problems_by_index[DF_LIVE])
 #define df_chain   (df->problems_by_index[DF_CHAIN])
-#define df_byte_lr (df->problems_by_index[DF_BYTE_LR])
+#define df_word_lr (df->problems_by_index[DF_WORD_LR])
 #define df_note    (df->problems_by_index[DF_NOTE])
 #define df_md      (df->problems_by_index[DF_MD])
 
@@ -933,7 +935,7 @@  extern df_ref df_find_use (rtx, rtx);
 extern bool df_reg_used (rtx, rtx);
 extern void df_worklist_dataflow (struct dataflow *,bitmap, int *, int);
 extern void df_print_regset (FILE *file, bitmap r);
-extern void df_print_byte_regset (FILE *file, bitmap r);
+extern void df_print_word_regset (FILE *file, bitmap r);
 extern void df_dump (FILE *);
 extern void df_dump_region (FILE *);
 extern void df_dump_start (FILE *);
@@ -972,13 +974,12 @@  extern void df_live_verify_transfer_func
 extern void df_live_add_problem (void);
 extern void df_live_set_all_dirty (void);
 extern void df_chain_add_problem (unsigned int);
-extern void df_byte_lr_add_problem (void);
-extern int df_byte_lr_get_regno_start (unsigned int);
-extern int df_byte_lr_get_regno_len (unsigned int);
-extern void df_byte_lr_simulate_defs (rtx, bitmap);
-extern void df_byte_lr_simulate_uses (rtx, bitmap);
-extern void df_byte_lr_simulate_artificial_refs_at_top (basic_block, bitmap);
-extern void df_byte_lr_simulate_artificial_refs_at_end (basic_block, bitmap);
+extern void df_word_lr_add_problem (void);
+extern bool df_word_lr_mark_ref (df_ref, bool, bitmap);
+extern bool df_word_lr_simulate_defs (rtx, bitmap);
+extern void df_word_lr_simulate_uses (rtx, bitmap);
+extern void df_word_lr_simulate_artificial_refs_at_top (basic_block, bitmap);
+extern void df_word_lr_simulate_artificial_refs_at_end (basic_block, bitmap);
 extern void df_note_add_problem (void);
 extern void df_md_add_problem (void);
 extern void df_md_simulate_artificial_defs_at_top (basic_block, bitmap);
@@ -1081,11 +1082,11 @@  df_live_get_bb_info (unsigned int index)
     return NULL;
 }
 
-static inline struct df_byte_lr_bb_info *
-df_byte_lr_get_bb_info (unsigned int index)
+static inline struct df_word_lr_bb_info *
+df_word_lr_get_bb_info (unsigned int index)
 {
-  if (index < df_byte_lr->block_info_size)
-    return &((struct df_byte_lr_bb_info *) df_byte_lr->block_info)[index];
+  if (index < df_word_lr->block_info_size)
+    return &((struct df_word_lr_bb_info *) df_word_lr->block_info)[index];
   else
     return NULL;
 }
Index: gcc/df-problems.c
===================================================================
--- gcc.orig/df-problems.c
+++ gcc/df-problems.c
@@ -2286,84 +2286,31 @@  df_chain_add_problem (unsigned int chain
 
 
 /*----------------------------------------------------------------------------
-   BYTE LEVEL LIVE REGISTERS
+   WORD LEVEL LIVE REGISTERS
 
    Find the locations in the function where any use of a pseudo can
    reach in the backwards direction.  In and out bitvectors are built
-   for each basic block.  There are two mapping functions,
-   df_byte_lr_get_regno_start and df_byte_lr_get_regno_len that are
-   used to map regnos into bit vector positions.
-
-   This problem differs from the regular df_lr function in the way
-   that subregs, *_extracts and strict_low_parts are handled. In lr
-   these are consider partial kills, here, the exact set of bytes is
-   modeled.  Note that any reg that has none of these operations is
-   only modeled with a single bit since all operations access the
-   entire register.
-
-   This problem is more brittle that the regular lr.  It currently can
-   be used in dce incrementally, but cannot be used in an environment
-   where insns are created or modified.  The problem is that the
-   mapping of regnos to bitmap positions is relatively compact, in
-   that if a pseudo does not do any of the byte wise operations, only
-   one slot is allocated, rather than a slot for each byte.  If insn
-   are created, where a subreg is used for a reg that had no subregs,
-   the mapping would be wrong.  Likewise, there are no checks to see
-   that new pseudos have been added.  These issues could be addressed
-   by adding a problem specific flag to not use the compact mapping,
-   if there was a need to do so.
+   for each basic block.  We only track pseudo registers that have a
+   size of 2 * UNITS_PER_WORD; bitmaps are indexed by 2 * regno and
+   contain two bits corresponding to each of the subwords.
 
    ----------------------------------------------------------------------------*/
 
 /* Private data used to verify the solution for this problem.  */
-struct df_byte_lr_problem_data
+struct df_word_lr_problem_data
 {
-  /* Expanded versions of bitvectors used in lr.  */
-  bitmap_head invalidated_by_call;
-  bitmap_head hardware_regs_used;
-
-  /* Indexed by regno, this is true if there are subregs, extracts or
-     strict_low_parts for this regno.  */
-  bitmap_head needs_expansion;
-
-  /* The start position and len for each regno in the various bit
-     vectors.  */
-  unsigned int* regno_start;
-  unsigned int* regno_len;
   /* An obstack for the bitmaps we need for this problem.  */
-  bitmap_obstack byte_lr_bitmaps;
+  bitmap_obstack word_lr_bitmaps;
 };
 
 
-/* Get the starting location for REGNO in the df_byte_lr bitmaps.  */
-
-int
-df_byte_lr_get_regno_start (unsigned int regno)
-{
-  struct df_byte_lr_problem_data *problem_data
-    = (struct df_byte_lr_problem_data *)df_byte_lr->problem_data;;
-  return problem_data->regno_start[regno];
-}
-
-
-/* Get the len for REGNO in the df_byte_lr bitmaps.  */
-
-int
-df_byte_lr_get_regno_len (unsigned int regno)
-{
-  struct df_byte_lr_problem_data *problem_data
-    = (struct df_byte_lr_problem_data *)df_byte_lr->problem_data;;
-  return problem_data->regno_len[regno];
-}
-
-
 /* Free basic block info.  */
 
 static void
-df_byte_lr_free_bb_info (basic_block bb ATTRIBUTE_UNUSED,
+df_word_lr_free_bb_info (basic_block bb ATTRIBUTE_UNUSED,
 			 void *vbb_info)
 {
-  struct df_byte_lr_bb_info *bb_info = (struct df_byte_lr_bb_info *) vbb_info;
+  struct df_word_lr_bb_info *bb_info = (struct df_word_lr_bb_info *) vbb_info;
   if (bb_info)
     {
       bitmap_clear (&bb_info->use);
@@ -2374,65 +2321,21 @@  df_byte_lr_free_bb_info (basic_block bb 
 }
 
 
-/* Check all of the refs in REF_REC to see if any of them are
-   extracts, subregs or strict_low_parts.  */
-
-static void
-df_byte_lr_check_regs (df_ref *ref_rec)
-{
-  struct df_byte_lr_problem_data *problem_data
-    = (struct df_byte_lr_problem_data *)df_byte_lr->problem_data;
-
-  for (; *ref_rec; ref_rec++)
-    {
-      df_ref ref = *ref_rec;
-      if (DF_REF_FLAGS_IS_SET (ref, DF_REF_SIGN_EXTRACT
-			       | DF_REF_ZERO_EXTRACT
-			       | DF_REF_STRICT_LOW_PART)
-	  || GET_CODE (DF_REF_REG (ref)) == SUBREG)
-	bitmap_set_bit (&problem_data->needs_expansion, DF_REF_REGNO (ref));
-    }
-}
-
-
-/* Expand bitmap SRC which is indexed by regno to DEST which is indexed by
-   regno_start and regno_len.  */
-
-static void
-df_byte_lr_expand_bitmap (bitmap dest, bitmap src)
-{
-  struct df_byte_lr_problem_data *problem_data
-    = (struct df_byte_lr_problem_data *)df_byte_lr->problem_data;
-  bitmap_iterator bi;
-  unsigned int i;
-
-  bitmap_clear (dest);
-  EXECUTE_IF_SET_IN_BITMAP (src, 0, i, bi)
-    {
-      bitmap_set_range (dest, problem_data->regno_start[i],
-			problem_data->regno_len[i]);
-    }
-}
-
-
-/* Allocate or reset bitmaps for DF_BYTE_LR blocks. The solution bits are
+/* Allocate or reset bitmaps for DF_WORD_LR blocks. The solution bits are
    not touched unless the block is new.  */
 
 static void
-df_byte_lr_alloc (bitmap all_blocks ATTRIBUTE_UNUSED)
+df_word_lr_alloc (bitmap all_blocks ATTRIBUTE_UNUSED)
 {
   unsigned int bb_index;
   bitmap_iterator bi;
   basic_block bb;
-  unsigned int regno;
-  unsigned int index = 0;
-  unsigned int max_reg = max_reg_num();
-  struct df_byte_lr_problem_data *problem_data
-    = XNEW (struct df_byte_lr_problem_data);
+  struct df_word_lr_problem_data *problem_data
+    = XNEW (struct df_word_lr_problem_data);
 
-  df_byte_lr->problem_data = problem_data;
+  df_word_lr->problem_data = problem_data;
 
-  df_grow_bb_info (df_byte_lr);
+  df_grow_bb_info (df_word_lr);
 
   /* Create the mapping from regnos to slots. This does not change
      unless the problem is destroyed and recreated.  In particular, if
@@ -2440,58 +2343,17 @@  df_byte_lr_alloc (bitmap all_blocks ATTR
      want to redo the mapping because this would invalidate everything
      else.  */
 
-  bitmap_obstack_initialize (&problem_data->byte_lr_bitmaps);
-  problem_data->regno_start = XNEWVEC (unsigned int, max_reg);
-  problem_data->regno_len = XNEWVEC (unsigned int, max_reg);
-  bitmap_initialize (&problem_data->hardware_regs_used,
-		     &problem_data->byte_lr_bitmaps);
-  bitmap_initialize (&problem_data->invalidated_by_call,
-		     &problem_data->byte_lr_bitmaps);
-  bitmap_initialize (&problem_data->needs_expansion,
-		     &problem_data->byte_lr_bitmaps);
+  bitmap_obstack_initialize (&problem_data->word_lr_bitmaps);
 
-  /* Discover which regno's use subregs, extracts or
-     strict_low_parts.  */
   FOR_EACH_BB (bb)
-    {
-      rtx insn;
-      FOR_BB_INSNS (bb, insn)
-	{
-	  if (INSN_P (insn))
-	    {
-	      struct df_insn_info *insn_info = DF_INSN_INFO_GET (insn);
-	      df_byte_lr_check_regs (DF_INSN_INFO_DEFS (insn_info));
-	      df_byte_lr_check_regs (DF_INSN_INFO_USES (insn_info));
-	    }
-	}
-      bitmap_set_bit (df_byte_lr->out_of_date_transfer_functions, bb->index);
-    }
+    bitmap_set_bit (df_word_lr->out_of_date_transfer_functions, bb->index);
 
-  bitmap_set_bit (df_byte_lr->out_of_date_transfer_functions, ENTRY_BLOCK);
-  bitmap_set_bit (df_byte_lr->out_of_date_transfer_functions, EXIT_BLOCK);
+  bitmap_set_bit (df_word_lr->out_of_date_transfer_functions, ENTRY_BLOCK);
+  bitmap_set_bit (df_word_lr->out_of_date_transfer_functions, EXIT_BLOCK);
 
-  /* Allocate the slots for each regno.  */
-  for (regno = 0; regno < max_reg; regno++)
+  EXECUTE_IF_SET_IN_BITMAP (df_word_lr->out_of_date_transfer_functions, 0, bb_index, bi)
     {
-      int len;
-      problem_data->regno_start[regno] = index;
-      if (bitmap_bit_p (&problem_data->needs_expansion, regno))
-	len = GET_MODE_SIZE (GET_MODE (regno_reg_rtx[regno]));
-      else
-	len = 1;
-
-      problem_data->regno_len[regno] = len;
-      index += len;
-    }
-
-  df_byte_lr_expand_bitmap (&problem_data->hardware_regs_used,
-			    &df->hardware_regs_used);
-  df_byte_lr_expand_bitmap (&problem_data->invalidated_by_call,
-			    regs_invalidated_by_call_regset);
-
-  EXECUTE_IF_SET_IN_BITMAP (df_byte_lr->out_of_date_transfer_functions, 0, bb_index, bi)
-    {
-      struct df_byte_lr_bb_info *bb_info = df_byte_lr_get_bb_info (bb_index);
+      struct df_word_lr_bb_info *bb_info = df_word_lr_get_bb_info (bb_index);
       
       /* When bitmaps are already initialized, just clear them.  */
       if (bb_info->use.obstack)
@@ -2501,74 +2363,109 @@  df_byte_lr_alloc (bitmap all_blocks ATTR
 	}
       else
 	{
-	  bitmap_initialize (&bb_info->use, &problem_data->byte_lr_bitmaps);
-	  bitmap_initialize (&bb_info->def, &problem_data->byte_lr_bitmaps);
-	  bitmap_initialize (&bb_info->in, &problem_data->byte_lr_bitmaps);
-	  bitmap_initialize (&bb_info->out, &problem_data->byte_lr_bitmaps);
+	  bitmap_initialize (&bb_info->use, &problem_data->word_lr_bitmaps);
+	  bitmap_initialize (&bb_info->def, &problem_data->word_lr_bitmaps);
+	  bitmap_initialize (&bb_info->in, &problem_data->word_lr_bitmaps);
+	  bitmap_initialize (&bb_info->out, &problem_data->word_lr_bitmaps);
 	}
     }
 
-  df_byte_lr->optional_p = true;
+  df_word_lr->optional_p = true;
 }
 
 
 /* Reset the global solution for recalculation.  */
 
 static void
-df_byte_lr_reset (bitmap all_blocks)
+df_word_lr_reset (bitmap all_blocks)
 {
   unsigned int bb_index;
   bitmap_iterator bi;
 
   EXECUTE_IF_SET_IN_BITMAP (all_blocks, 0, bb_index, bi)
     {
-      struct df_byte_lr_bb_info *bb_info = df_byte_lr_get_bb_info (bb_index);
+      struct df_word_lr_bb_info *bb_info = df_word_lr_get_bb_info (bb_index);
       gcc_assert (bb_info);
       bitmap_clear (&bb_info->in);
       bitmap_clear (&bb_info->out);
     }
 }
 
+/* Examine REF, and if it is for a reg we're interested in, set or
+   clear the bits corresponding to its subwords from the bitmap
+   according to IS_SET.  LIVE is the bitmap we should update.  We do
+   not track hard regs or pseudos of any size other than 2 *
+   UNITS_PER_WORD.
+   We return true if we changed the bitmap, or if we encountered a register
+   we're not tracking.  */
+
+bool
+df_word_lr_mark_ref (df_ref ref, bool is_set, regset live)
+{
+  rtx orig_reg = DF_REF_REG (ref);
+  rtx reg = orig_reg;
+  enum machine_mode reg_mode;
+  unsigned regno;
+  /* Left at -1 for whole accesses.  */
+  int which_subword = -1;
+  bool changed = false;
+
+  if (GET_CODE (reg) == SUBREG)
+    reg = SUBREG_REG (orig_reg);
+  regno = REGNO (reg);
+  reg_mode = GET_MODE (reg);
+  if (regno < FIRST_PSEUDO_REGISTER
+      || GET_MODE_SIZE (reg_mode) != 2 * UNITS_PER_WORD)
+    return true;
+
+  if (GET_CODE (orig_reg) == SUBREG
+      && df_read_modify_subreg_p (orig_reg))
+    {
+      gcc_assert (DF_REF_FLAGS_IS_SET (ref, DF_REF_PARTIAL));
+      if (subreg_lowpart_p (orig_reg))
+	which_subword = 0;
+      else
+	which_subword = 1;
+    }
+  if (is_set)
+    {
+      if (which_subword != 1)
+	changed |= bitmap_set_bit (live, regno * 2);
+      if (which_subword != 0)
+	changed |= bitmap_set_bit (live, regno * 2 + 1);
+    }
+  else
+    {
+      if (which_subword != 1)
+	changed |= bitmap_clear_bit (live, regno * 2);
+      if (which_subword != 0)
+	changed |= bitmap_clear_bit (live, regno * 2 + 1);
+    }
+  return changed;
+}
 
 /* Compute local live register info for basic block BB.  */
 
 static void
-df_byte_lr_bb_local_compute (unsigned int bb_index)
+df_word_lr_bb_local_compute (unsigned int bb_index)
 {
-  struct df_byte_lr_problem_data *problem_data
-    = (struct df_byte_lr_problem_data *)df_byte_lr->problem_data;
   basic_block bb = BASIC_BLOCK (bb_index);
-  struct df_byte_lr_bb_info *bb_info = df_byte_lr_get_bb_info (bb_index);
+  struct df_word_lr_bb_info *bb_info = df_word_lr_get_bb_info (bb_index);
   rtx insn;
   df_ref *def_rec;
   df_ref *use_rec;
 
-  /* Process the registers set in an exception handler.  */
+  /* Ensure that artificial refs don't contain references to pseudos.  */
   for (def_rec = df_get_artificial_defs (bb_index); *def_rec; def_rec++)
     {
       df_ref def = *def_rec;
-      if ((DF_REF_FLAGS (def) & DF_REF_AT_TOP) == 0)
-	{
-	  unsigned int dregno = DF_REF_REGNO (def);
-	  unsigned int start = problem_data->regno_start[dregno];
-	  unsigned int len = problem_data->regno_len[dregno];
-	  bitmap_set_range (&bb_info->def, start, len);
-	  bitmap_clear_range (&bb_info->use, start, len);
-	}
+      gcc_assert (DF_REF_REGNO (def) < FIRST_PSEUDO_REGISTER);
     }
 
-  /* Process the hardware registers that are always live.  */
   for (use_rec = df_get_artificial_uses (bb_index); *use_rec; use_rec++)
     {
       df_ref use = *use_rec;
-      /* Add use to set of uses in this BB.  */
-      if ((DF_REF_FLAGS (use) & DF_REF_AT_TOP) == 0)
-	{
-	  unsigned int uregno = DF_REF_REGNO (use);
-	  unsigned int start = problem_data->regno_start[uregno];
-	  unsigned int len = problem_data->regno_len[uregno];
-	  bitmap_set_range (&bb_info->use, start, len);
-	}
+      gcc_assert (DF_REF_REGNO (use) < FIRST_PSEUDO_REGISTER);
     }
 
   FOR_BB_INSNS_REVERSE (bb, insn)
@@ -2577,7 +2474,6 @@  df_byte_lr_bb_local_compute (unsigned in
 
       if (!INSN_P (insn))
 	continue;
-
       for (def_rec = DF_INSN_UID_DEFS (uid); *def_rec; def_rec++)
 	{
 	  df_ref def = *def_rec;
@@ -2585,164 +2481,80 @@  df_byte_lr_bb_local_compute (unsigned in
 	     not kill the other defs that reach here.  */
 	  if (!(DF_REF_FLAGS (def) & (DF_REF_CONDITIONAL)))
 	    {
-	      unsigned int dregno = DF_REF_REGNO (def);
-	      unsigned int start = problem_data->regno_start[dregno];
-	      unsigned int len = problem_data->regno_len[dregno];
-	      unsigned int sb;
-	      unsigned int lb;
-	      if (!df_compute_accessed_bytes (def, DF_MM_MUST, &sb, &lb))
-		{
-		  start += sb;
-		  len = lb - sb;
-		}
-	      if (len)
-		{
-		  bitmap_set_range (&bb_info->def, start, len);
-		  bitmap_clear_range (&bb_info->use, start, len);
-		}
+	      df_word_lr_mark_ref (def, true, &bb_info->def);
+	      df_word_lr_mark_ref (def, false, &bb_info->use);
 	    }
 	}
-
       for (use_rec = DF_INSN_UID_USES (uid); *use_rec; use_rec++)
 	{
 	  df_ref use = *use_rec;
-	  unsigned int uregno = DF_REF_REGNO (use);
-	  unsigned int start = problem_data->regno_start[uregno];
-	  unsigned int len = problem_data->regno_len[uregno];
-	  unsigned int sb;
-	  unsigned int lb;
-	  if (!df_compute_accessed_bytes (use, DF_MM_MAY, &sb, &lb))
-	    {
-	      start += sb;
-	      len = lb - sb;
-	    }
-	  /* Add use to set of uses in this BB.  */
-	  if (len)
-	    bitmap_set_range (&bb_info->use, start, len);
-	}
-    }
-
-  /* Process the registers set in an exception handler or the hard
-     frame pointer if this block is the target of a non local
-     goto.  */
-  for (def_rec = df_get_artificial_defs (bb_index); *def_rec; def_rec++)
-    {
-      df_ref def = *def_rec;
-      if (DF_REF_FLAGS (def) & DF_REF_AT_TOP)
-	{
-	  unsigned int dregno = DF_REF_REGNO (def);
-	  unsigned int start = problem_data->regno_start[dregno];
-	  unsigned int len = problem_data->regno_len[dregno];
-	  bitmap_set_range (&bb_info->def, start, len);
-	  bitmap_clear_range (&bb_info->use, start, len);
+	  df_word_lr_mark_ref (use, true, &bb_info->use);
 	}
     }
-
-#ifdef EH_USES
-  /* Process the uses that are live into an exception handler.  */
-  for (use_rec = df_get_artificial_uses (bb_index); *use_rec; use_rec++)
-    {
-      df_ref use = *use_rec;
-      /* Add use to set of uses in this BB.  */
-      if (DF_REF_FLAGS (use) & DF_REF_AT_TOP)
-	{
-	  unsigned int uregno = DF_REF_REGNO (use);
-	  unsigned int start = problem_data->regno_start[uregno];
-	  unsigned int len = problem_data->regno_len[uregno];
-	  bitmap_set_range (&bb_info->use, start, len);
-	}
-    }
-#endif
 }
 
 
 /* Compute local live register info for each basic block within BLOCKS.  */
 
 static void
-df_byte_lr_local_compute (bitmap all_blocks ATTRIBUTE_UNUSED)
+df_word_lr_local_compute (bitmap all_blocks ATTRIBUTE_UNUSED)
 {
   unsigned int bb_index;
   bitmap_iterator bi;
 
-  EXECUTE_IF_SET_IN_BITMAP (df_byte_lr->out_of_date_transfer_functions, 0, bb_index, bi)
+  EXECUTE_IF_SET_IN_BITMAP (df_word_lr->out_of_date_transfer_functions, 0, bb_index, bi)
     {
       if (bb_index == EXIT_BLOCK)
 	{
-	  /* The exit block is special for this problem and its bits are
-	     computed from thin air.  */
-	  struct df_byte_lr_bb_info *bb_info = df_byte_lr_get_bb_info (EXIT_BLOCK);
-	  df_byte_lr_expand_bitmap (&bb_info->use, df->exit_block_uses);
+	  unsigned regno;
+	  bitmap_iterator bi;
+	  EXECUTE_IF_SET_IN_BITMAP (df->exit_block_uses, FIRST_PSEUDO_REGISTER,
+				    regno, bi)
+	    gcc_unreachable ();
 	}
       else
-	df_byte_lr_bb_local_compute (bb_index);
+	df_word_lr_bb_local_compute (bb_index);
     }
 
-  bitmap_clear (df_byte_lr->out_of_date_transfer_functions);
+  bitmap_clear (df_word_lr->out_of_date_transfer_functions);
 }
 
 
 /* Initialize the solution vectors.  */
 
 static void
-df_byte_lr_init (bitmap all_blocks)
+df_word_lr_init (bitmap all_blocks)
 {
   unsigned int bb_index;
   bitmap_iterator bi;
 
   EXECUTE_IF_SET_IN_BITMAP (all_blocks, 0, bb_index, bi)
     {
-      struct df_byte_lr_bb_info *bb_info = df_byte_lr_get_bb_info (bb_index);
+      struct df_word_lr_bb_info *bb_info = df_word_lr_get_bb_info (bb_index);
       bitmap_copy (&bb_info->in, &bb_info->use);
       bitmap_clear (&bb_info->out);
     }
 }
 
 
-/* Confluence function that processes infinite loops.  This might be a
-   noreturn function that throws.  And even if it isn't, getting the
-   unwind info right helps debugging.  */
-static void
-df_byte_lr_confluence_0 (basic_block bb)
-{
-  struct df_byte_lr_problem_data *problem_data
-    = (struct df_byte_lr_problem_data *)df_byte_lr->problem_data;
-  bitmap op1 = &df_byte_lr_get_bb_info (bb->index)->out;
-  if (bb != EXIT_BLOCK_PTR)
-    bitmap_copy (op1, &problem_data->hardware_regs_used);
-}
-
-
 /* Confluence function that ignores fake edges.  */
 
 static bool
-df_byte_lr_confluence_n (edge e)
+df_word_lr_confluence_n (edge e)
 {
-  struct df_byte_lr_problem_data *problem_data
-    = (struct df_byte_lr_problem_data *)df_byte_lr->problem_data;
-  bitmap op1 = &df_byte_lr_get_bb_info (e->src->index)->out;
-  bitmap op2 = &df_byte_lr_get_bb_info (e->dest->index)->in;
-  bool changed = false;
-
-  /* Call-clobbered registers die across exception and call edges.  */
-  /* ??? Abnormal call edges ignored for the moment, as this gets
-     confused by sibling call edges, which crashes reg-stack.  */
-  if (e->flags & EDGE_EH)
-    changed = bitmap_ior_and_compl_into (op1, op2,
-					 &problem_data->invalidated_by_call);
-  else
-    changed = bitmap_ior_into (op1, op2);
+  bitmap op1 = &df_word_lr_get_bb_info (e->src->index)->out;
+  bitmap op2 = &df_word_lr_get_bb_info (e->dest->index)->in;
 
-  changed |= bitmap_ior_into (op1, &problem_data->hardware_regs_used);
-  return changed;
+  return bitmap_ior_into (op1, op2);
 }
 
 
 /* Transfer function.  */
 
 static bool
-df_byte_lr_transfer_function (int bb_index)
+df_word_lr_transfer_function (int bb_index)
 {
-  struct df_byte_lr_bb_info *bb_info = df_byte_lr_get_bb_info (bb_index);
+  struct df_word_lr_bb_info *bb_info = df_word_lr_get_bb_info (bb_index);
   bitmap in = &bb_info->in;
   bitmap out = &bb_info->out;
   bitmap use = &bb_info->use;
@@ -2755,86 +2567,83 @@  df_byte_lr_transfer_function (int bb_ind
 /* Free all storage associated with the problem.  */
 
 static void
-df_byte_lr_free (void)
+df_word_lr_free (void)
 {
-  struct df_byte_lr_problem_data *problem_data
-    = (struct df_byte_lr_problem_data *)df_byte_lr->problem_data;
-
+  struct df_word_lr_problem_data *problem_data
+    = (struct df_word_lr_problem_data *)df_word_lr->problem_data;
 
-  if (df_byte_lr->block_info)
+  if (df_word_lr->block_info)
     {
-      df_byte_lr->block_info_size = 0;
-      free (df_byte_lr->block_info);
-      df_byte_lr->block_info = NULL;
+      df_word_lr->block_info_size = 0;
+      free (df_word_lr->block_info);
+      df_word_lr->block_info = NULL;
     }
 
-  BITMAP_FREE (df_byte_lr->out_of_date_transfer_functions);
-  bitmap_obstack_release (&problem_data->byte_lr_bitmaps);
-  free (problem_data->regno_start);
-  free (problem_data->regno_len);
+  BITMAP_FREE (df_word_lr->out_of_date_transfer_functions);
+  bitmap_obstack_release (&problem_data->word_lr_bitmaps);
   free (problem_data);
-  free (df_byte_lr);
+  free (df_word_lr);
 }
 
 
 /* Debugging info at top of bb.  */
 
 static void
-df_byte_lr_top_dump (basic_block bb, FILE *file)
+df_word_lr_top_dump (basic_block bb, FILE *file)
 {
-  struct df_byte_lr_bb_info *bb_info = df_byte_lr_get_bb_info (bb->index);
+  struct df_word_lr_bb_info *bb_info = df_word_lr_get_bb_info (bb->index);
   if (!bb_info)
     return;
 
   fprintf (file, ";; blr  in  \t");
-  df_print_byte_regset (file, &bb_info->in);
+  df_print_word_regset (file, &bb_info->in);
   fprintf (file, ";; blr  use \t");
-  df_print_byte_regset (file, &bb_info->use);
+  df_print_word_regset (file, &bb_info->use);
   fprintf (file, ";; blr  def \t");
-  df_print_byte_regset (file, &bb_info->def);
+  df_print_word_regset (file, &bb_info->def);
 }
 
 
 /* Debugging info at bottom of bb.  */
 
 static void
-df_byte_lr_bottom_dump (basic_block bb, FILE *file)
+df_word_lr_bottom_dump (basic_block bb, FILE *file)
 {
-  struct df_byte_lr_bb_info *bb_info = df_byte_lr_get_bb_info (bb->index);
+  struct df_word_lr_bb_info *bb_info = df_word_lr_get_bb_info (bb->index);
   if (!bb_info)
     return;
 
   fprintf (file, ";; blr  out \t");
-  df_print_byte_regset (file, &bb_info->out);
+  df_print_word_regset (file, &bb_info->out);
 }
 
 
 /* All of the information associated with every instance of the problem.  */
 
-static struct df_problem problem_BYTE_LR =
+static struct df_problem problem_WORD_LR =
 {
-  DF_BYTE_LR,                      /* Problem id.  */
+  DF_WORD_LR,                      /* Problem id.  */
   DF_BACKWARD,                     /* Direction.  */
-  df_byte_lr_alloc,                /* Allocate the problem specific data.  */
-  df_byte_lr_reset,                /* Reset global information.  */
-  df_byte_lr_free_bb_info,         /* Free basic block info.  */
-  df_byte_lr_local_compute,        /* Local compute function.  */
-  df_byte_lr_init,                 /* Init the solution specific data.  */
+  df_word_lr_alloc,                /* Allocate the problem specific data.  */
+  df_word_lr_reset,                /* Reset global information.  */
+  df_word_lr_free_bb_info,         /* Free basic block info.  */
+  df_word_lr_local_compute,        /* Local compute function.  */
+  df_word_lr_init,                 /* Init the solution specific data.  */
   df_worklist_dataflow,            /* Worklist solver.  */
-  df_byte_lr_confluence_0,         /* Confluence operator 0.  */
-  df_byte_lr_confluence_n,         /* Confluence operator n.  */
-  df_byte_lr_transfer_function,    /* Transfer function.  */
+  NULL,                            /* Confluence operator 0.  */
+  df_word_lr_confluence_n,         /* Confluence operator n.  */
+  df_word_lr_transfer_function,    /* Transfer function.  */
   NULL,                            /* Finalize function.  */
-  df_byte_lr_free,                 /* Free all of the problem information.  */
-  df_byte_lr_free,                 /* Remove this problem from the stack of dataflow problems.  */
+  df_word_lr_free,                 /* Free all of the problem information.  */
+  df_word_lr_free,                 /* Remove this problem from the stack of dataflow problems.  */
   NULL,                            /* Debugging.  */
-  df_byte_lr_top_dump,             /* Debugging start block.  */
-  df_byte_lr_bottom_dump,          /* Debugging end block.  */
+  df_word_lr_top_dump,             /* Debugging start block.  */
+  df_word_lr_bottom_dump,          /* Debugging end block.  */
   NULL,                            /* Incremental solution verify start.  */
   NULL,                            /* Incremental solution verify end.  */
   NULL,                       /* Dependent problem.  */
-  sizeof (struct df_byte_lr_bb_info),/* Size of entry of block_info array.  */
-  TV_DF_BYTE_LR,                   /* Timing variable.  */
+  sizeof (struct df_word_lr_bb_info),/* Size of entry of block_info array.  */
+  TV_DF_WORD_LR,                   /* Timing variable.  */
   false                            /* Reset blocks on dropping out of blocks_to_analyze.  */
 };
 
@@ -2844,163 +2653,50 @@  static struct df_problem problem_BYTE_LR
    solution.  */
 
 void
-df_byte_lr_add_problem (void)
+df_word_lr_add_problem (void)
 {
-  df_add_problem (&problem_BYTE_LR);
+  df_add_problem (&problem_WORD_LR);
   /* These will be initialized when df_scan_blocks processes each
      block.  */
-  df_byte_lr->out_of_date_transfer_functions = BITMAP_ALLOC (NULL);
+  df_word_lr->out_of_date_transfer_functions = BITMAP_ALLOC (NULL);
 }
 
 
-/* Simulate the effects of the defs of INSN on LIVE.  */
+/* Simulate the effects of the defs of INSN on LIVE.  Return true if we changed
+   any bits, which is used by the caller to determine whether a set is
+   necessary.  We also return true if there are other reasons not to delete
+   an insn.  */
 
-void
-df_byte_lr_simulate_defs (rtx insn, bitmap live)
+bool
+df_word_lr_simulate_defs (rtx insn, bitmap live)
 {
-  struct df_byte_lr_problem_data *problem_data
-    = (struct df_byte_lr_problem_data *)df_byte_lr->problem_data;
+  bool changed = false;
   df_ref *def_rec;
   unsigned int uid = INSN_UID (insn);
 
   for (def_rec = DF_INSN_UID_DEFS (uid); *def_rec; def_rec++)
     {
       df_ref def = *def_rec;
-
-      /* If the def is to only part of the reg, it does
-	 not kill the other defs that reach here.  */
-      if (!(DF_REF_FLAGS (def) & DF_REF_CONDITIONAL))
-	{
-	  unsigned int dregno = DF_REF_REGNO (def);
-	  unsigned int start = problem_data->regno_start[dregno];
-	  unsigned int len = problem_data->regno_len[dregno];
-	  unsigned int sb;
-	  unsigned int lb;
-	  if (!df_compute_accessed_bytes (def, DF_MM_MUST, &sb, &lb))
-	    {
-	      start += sb;
-	      len = lb - sb;
-	    }
-
-	  if (len)
-	    bitmap_clear_range (live, start, len);
-	}
+      if (DF_REF_FLAGS (def) & DF_REF_CONDITIONAL)
+	changed = true;
+      else
+	changed |= df_word_lr_mark_ref (*def_rec, false, live);
     }
+  return changed;
 }
 
 
 /* Simulate the effects of the uses of INSN on LIVE.  */
 
 void
-df_byte_lr_simulate_uses (rtx insn, bitmap live)
+df_word_lr_simulate_uses (rtx insn, bitmap live)
 {
-  struct df_byte_lr_problem_data *problem_data
-    = (struct df_byte_lr_problem_data *)df_byte_lr->problem_data;
   df_ref *use_rec;
   unsigned int uid = INSN_UID (insn);
 
   for (use_rec = DF_INSN_UID_USES (uid); *use_rec; use_rec++)
-    {
-      df_ref use = *use_rec;
-      unsigned int uregno = DF_REF_REGNO (use);
-      unsigned int start = problem_data->regno_start[uregno];
-      unsigned int len = problem_data->regno_len[uregno];
-      unsigned int sb;
-      unsigned int lb;
-
-      if (!df_compute_accessed_bytes (use, DF_MM_MAY, &sb, &lb))
-	{
-	  start += sb;
-	  len = lb - sb;
-	}
-
-      /* Add use to set of uses in this BB.  */
-      if (len)
-	bitmap_set_range (live, start, len);
-    }
+    df_word_lr_mark_ref (*use_rec, true, live);
 }
-
-
-/* Apply the artificial uses and defs at the top of BB in a forwards
-   direction.  */
-
-void
-df_byte_lr_simulate_artificial_refs_at_top (basic_block bb, bitmap live)
-{
-  struct df_byte_lr_problem_data *problem_data
-    = (struct df_byte_lr_problem_data *)df_byte_lr->problem_data;
-  df_ref *def_rec;
-#ifdef EH_USES
-  df_ref *use_rec;
-#endif
-  int bb_index = bb->index;
-
-#ifdef EH_USES
-  for (use_rec = df_get_artificial_uses (bb_index); *use_rec; use_rec++)
-    {
-      df_ref use = *use_rec;
-      if (DF_REF_FLAGS (use) & DF_REF_AT_TOP)
-	{
-	  unsigned int uregno = DF_REF_REGNO (use);
-	  unsigned int start = problem_data->regno_start[uregno];
-	  unsigned int len = problem_data->regno_len[uregno];
-	  bitmap_set_range (live, start, len);
-	}
-    }
-#endif
-
-  for (def_rec = df_get_artificial_defs (bb_index); *def_rec; def_rec++)
-    {
-      df_ref def = *def_rec;
-      if (DF_REF_FLAGS (def) & DF_REF_AT_TOP)
-	{
-	  unsigned int dregno = DF_REF_REGNO (def);
-	  unsigned int start = problem_data->regno_start[dregno];
-	  unsigned int len = problem_data->regno_len[dregno];
-	  bitmap_clear_range (live, start, len);
-	}
-    }
-}
-
-
-/* Apply the artificial uses and defs at the end of BB in a backwards
-   direction.  */
-
-void
-df_byte_lr_simulate_artificial_refs_at_end (basic_block bb, bitmap live)
-{
-  struct df_byte_lr_problem_data *problem_data
-    = (struct df_byte_lr_problem_data *)df_byte_lr->problem_data;
-  df_ref *def_rec;
-  df_ref *use_rec;
-  int bb_index = bb->index;
-
-  for (def_rec = df_get_artificial_defs (bb_index); *def_rec; def_rec++)
-    {
-      df_ref def = *def_rec;
-      if ((DF_REF_FLAGS (def) & DF_REF_AT_TOP) == 0)
-	{
-	  unsigned int dregno = DF_REF_REGNO (def);
-	  unsigned int start = problem_data->regno_start[dregno];
-	  unsigned int len = problem_data->regno_len[dregno];
-	  bitmap_clear_range (live, start, len);
-	}
-    }
-
-  for (use_rec = df_get_artificial_uses (bb_index); *use_rec; use_rec++)
-    {
-      df_ref use = *use_rec;
-      if ((DF_REF_FLAGS (use) & DF_REF_AT_TOP) == 0)
-	{
-	  unsigned int uregno = DF_REF_REGNO (use);
-	  unsigned int start = problem_data->regno_start[uregno];
-	  unsigned int len = problem_data->regno_len[uregno];
-	  bitmap_set_range (live, start, len);
-	}
-    }
-}
-
-
 
 /*----------------------------------------------------------------------------
    This problem computes REG_DEAD and REG_UNUSED notes.
Index: gcc/lower-subreg.c
===================================================================
--- gcc.orig/lower-subreg.c
+++ gcc/lower-subreg.c
@@ -33,6 +33,7 @@  along with GCC; see the file COPYING3.  
 #include "basic-block.h"
 #include "recog.h"
 #include "bitmap.h"
+#include "dce.h"
 #include "expr.h"
 #include "except.h"
 #include "regs.h"
@@ -1091,6 +1092,9 @@  decompose_multiword_subregs (void)
       return;
   }
 
+  if (df)
+    run_word_dce ();
+
   /* FIXME: When the dataflow branch is merged, we can change this
      code to look for each multi-word pseudo-register and to find each
      insn which sets or uses that register.  That should be faster
Index: gcc/timevar.def
===================================================================
--- gcc.orig/timevar.def
+++ gcc/timevar.def
@@ -91,7 +91,7 @@  DEFTIMEVAR (TV_DF_LR		     , "df live re
 DEFTIMEVAR (TV_DF_LIVE		     , "df live&initialized regs")
 DEFTIMEVAR (TV_DF_UREC		     , "df uninitialized regs 2")
 DEFTIMEVAR (TV_DF_CHAIN		     , "df use-def / def-use chains")
-DEFTIMEVAR (TV_DF_BYTE_LR	     , "df live byte regs")
+DEFTIMEVAR (TV_DF_WORD_LR	     , "df live reg subwords")
 DEFTIMEVAR (TV_DF_NOTE		     , "df reg dead/unused notes")
 DEFTIMEVAR (TV_REG_STATS	     , "register information")
 
Index: gcc/Makefile.in
===================================================================
--- gcc.orig/Makefile.in
+++ gcc/Makefile.in
@@ -3487,7 +3487,7 @@  dbgcnt.o: dbgcnt.c $(CONFIG_H) $(SYSTEM_
 lower-subreg.o : lower-subreg.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
    $(MACHMODE_H) $(TM_H) $(RTL_H) $(TM_P_H) $(TIMEVAR_H) $(FLAGS_H) \
    insn-config.h $(BASIC_BLOCK_H) $(RECOG_H) $(OBSTACK_H) $(BITMAP_H) \
-   $(EXPR_H) $(EXCEPT_H) $(REGS_H) $(TREE_PASS_H) $(DF_H)
+   $(EXPR_H) $(EXCEPT_H) $(REGS_H) $(TREE_PASS_H) $(DF_H) dce.h
 target-globals.o : target-globals.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
    $(TM_H) insn-config.h $(MACHMODE_H) $(GGC_H) $(TOPLEV_H) target-globals.h \
    $(FLAGS_H) $(REGS_H) $(RTL_H) reload.h expmed.h $(EXPR_H) $(OPTABS_H) \