Patchwork fix target/44606, reload bug on SPE

login
register
mail settings
Submitter Nathan Froyd
Date Sept. 30, 2010, 6:29 p.m.
Message ID <20100930182927.GL32503@codesourcery.com>
Download mbox | patch
Permalink /patch/66234/
State New
Headers show

Comments

Nathan Froyd - Sept. 30, 2010, 6:29 p.m.
PR44606 is bug because GCC tries to be very clever.  Before register
allocation, we have the insns:

(insn 323 319 84 6 pr44606.c:34 (set (reg:DF 750)
        (mem/u/c/i:DF (reg/f:SI 719) [5 S8 A64])) 1503 {*movdf_e500_double} (expr_list:REG_DEAD (reg/f:SI 719)
        (expr_list:REG_EQUAL (const_double:DF 5.0e-1 [0x0.8p+0])
            (nil))))

...

(insn 63 502 491 6 pr44606.c:20 (set (reg/v:SI 221 [ x ])
        (const_int 0 [0x0])) 349 {*movsi_internal1} (nil))

During register allocation, pseudo 750 gets assigned to r27.  Insn 63
receives an output reload, and choose_reload_regs goes looking for an
possibly equivalent register that holds 0.  find_equiv_regs has this
chunk of code when looking for an equivalence:

		  || (goal_const && REG_NOTES (p) != 0
		      && (tem = find_reg_note (p, REG_EQUIV, NULL_RTX))
		      && ((rtx_equal_p (XEXP (tem, 0), goal)
			   && (valueno
			       = true_regnum (valtry = SET_DEST (pat))) >= 0)
			  || (REG_P (SET_DEST (pat))
			      && GET_CODE (XEXP (tem, 0)) == CONST_DOUBLE
			      && SCALAR_FLOAT_MODE_P (GET_MODE (XEXP (tem, 0)))
			      && CONST_INT_P (goal)
			      && 0 != (goaltry
				       = operand_subword (XEXP (tem, 0), 0, 0,
							  VOIDmode))
			      && rtx_equal_p (goal, goaltry)
			      && (valtry
				  = operand_subword (SET_DEST (pat), 0, 0,
						     VOIDmode))
			      && (valueno = true_regnum (valtry)) >= 0)))
		  || (goal_const && (tem = find_reg_note (p, REG_EQUIV,
							  NULL_RTX))
		      && REG_P (SET_DEST (pat))
		      && GET_CODE (XEXP (tem, 0)) == CONST_DOUBLE
		      && SCALAR_FLOAT_MODE_P (GET_MODE (XEXP (tem, 0)))
		      && CONST_INT_P (goal)
		      && 0 != (goaltry = operand_subword (XEXP (tem, 0), 1, 0,
							  VOIDmode))
		      && rtx_equal_p (goal, goaltry)
		      && (valtry
			  = operand_subword (SET_DEST (pat), 1, 0, VOIDmode))
		      && (valueno = true_regnum (valtry)) >= 0)))

The complexity here comes is because we also consider the case where we
have a CONST_DOUBLE somewhere whose low or high part is equal to the
value we're trying to find an equivalence for.  In this case, the low
bits of 0.5 are equal to zero, so find_equiv_regs determines that r27 is
a suitable equivalence.  After register allocation, then, we have:

(insn 323 531 530 6 pr44606.c:34 (set (reg:DF 27 27 [750])
        (mem/u/c/i:DF (reg/f:SI 11 11 [719]) [5 S8 A64])) 1503 {*movdf_e500_double} (expr_list:REG_EQUIV (const_double:DF 5.0e-1 [0x0.8p+0])
        (nil)))

...

(insn 63 547 598 6 ../pr44606.c:20 (set (reg:SI 27 27 [orig:750+4 ] [750])
        (const_int 0 [0x0])) 349 {*movsi_internal1} (expr_list:REG_EQUAL (const_int 0 [0x0])
        (nil)))

and a later DCE determines that insn 323 is no longer needed, deleting
it and resulting in incorrect code at runtime.

The solution Bernd came up with is this: if we find an equivalent
register for an output reload, we remember it in reload_override_in
rather than reg_rtx.  (We can't consult reload_inherited for an
inherited output reload because this bit of code in choose_reload_regs:

	  if (! free_for_value_p (true_regnum (check_reg), rld[r].mode,
				  rld[r].opnum, rld[r].when_needed, rld[r].in,
				  (reload_inherited[r]
				   ? rld[r].out : const0_rtx),
				  r, 1))
	    {
	      if (pass)
		continue;
	      reload_inherited[r] = 0;
	      reload_override_in[r] = 0;
	    }

triggers: free_for_value_p says that r27 is not free, which it isn't.)
Once we have the, the fix becomes straightforward: emit_reload_insns can
check to see if we inherited an output reload.  If we did, then there's
no point in keeping that insn around, so we might as well delete it,
which is what reload_as_needed has bee modified to do.

Bootstrapping in progress on x86-64.  OK to commit and backport to 4.5
and 4.4?

-Nathan

gcc/
	* reload1.c (emit_reload_insns): Adjust prototype.  Check for
	inherited output reloads.
	(reload_as_needed): Delete insn if emit_reload_insns returns
	true.
	(choose_reload_regs): Save equiv in reload_override_in for
	output reloads.  Set reg_rtx from reload_override_in.

gcc/testsuite/
	PR target/44606
	* gcc.dg/pr44606.c: New.
Bernd Schmidt - Sept. 30, 2010, 8:49 p.m.
On 09/30/2010 08:29 PM, Nathan Froyd wrote:

> The solution Bernd came up with is this: if we find an equivalent
> register for an output reload, we remember it in reload_override_in
> rather than reg_rtx.
[...]
> Once we have the, the fix becomes straightforward: emit_reload_insns can
> check to see if we inherited an output reload.  If we did, then there's
> no point in keeping that insn around, so we might as well delete it,
> which is what reload_as_needed has bee modified to do.
> 
> Bootstrapping in progress on x86-64.  OK to commit and backport to 4.5
> and 4.4?

Thanks for debugging and submitting the patch.  I think ideally we'd get
a second opinion on this, but if no one comments within a week you can
check it in.


Bernd
Eric Botcazou - Sept. 30, 2010, 10:16 p.m.
> PR44606 is bug because GCC tries to be very clever.  Before register
> allocation, we have the insns:
>
> (insn 323 319 84 6 pr44606.c:34 (set (reg:DF 750)
>         (mem/u/c/i:DF (reg/f:SI 719) [5 S8 A64])) 1503 {*movdf_e500_double}
> (expr_list:REG_DEAD (reg/f:SI 719) (expr_list:REG_EQUAL (const_double:DF
> 5.0e-1 [0x0.8p+0])
>             (nil))))
>
> ...
>
> (insn 63 502 491 6 pr44606.c:20 (set (reg/v:SI 221 [ x ])
>         (const_int 0 [0x0])) 349 {*movsi_internal1} (nil))
>
> During register allocation, pseudo 750 gets assigned to r27.  Insn 63
> receives an output reload, and choose_reload_regs goes looking for an
> possibly equivalent register that holds 0.  find_equiv_regs has this
>
> chunk of code when looking for an equivalence:
> 		  || (goal_const && REG_NOTES (p) != 0
>
> 		      && (tem = find_reg_note (p, REG_EQUIV, NULL_RTX))
> 		      && ((rtx_equal_p (XEXP (tem, 0), goal)
> 			   && (valueno
> 			       = true_regnum (valtry = SET_DEST (pat))) >= 0)
>
> 			  || (REG_P (SET_DEST (pat))
>
> 			      && GET_CODE (XEXP (tem, 0)) == CONST_DOUBLE
> 			      && SCALAR_FLOAT_MODE_P (GET_MODE (XEXP (tem, 0)))
> 			      && CONST_INT_P (goal)
> 			      && 0 != (goaltry
> 				       = operand_subword (XEXP (tem, 0), 0, 0,
> 							  VOIDmode))
> 			      && rtx_equal_p (goal, goaltry)
> 			      && (valtry
> 				  = operand_subword (SET_DEST (pat), 0, 0,
> 						     VOIDmode))
> 			      && (valueno = true_regnum (valtry)) >= 0)))
>
> 		  || (goal_const && (tem = find_reg_note (p, REG_EQUIV,
>
> 							  NULL_RTX))
> 		      && REG_P (SET_DEST (pat))
> 		      && GET_CODE (XEXP (tem, 0)) == CONST_DOUBLE
> 		      && SCALAR_FLOAT_MODE_P (GET_MODE (XEXP (tem, 0)))
> 		      && CONST_INT_P (goal)
> 		      && 0 != (goaltry = operand_subword (XEXP (tem, 0), 1, 0,
> 							  VOIDmode))
> 		      && rtx_equal_p (goal, goaltry)
> 		      && (valtry
> 			  = operand_subword (SET_DEST (pat), 1, 0, VOIDmode))
> 		      && (valueno = true_regnum (valtry)) >= 0)))
>
> The complexity here comes is because we also consider the case where we
> have a CONST_DOUBLE somewhere whose low or high part is equal to the
> value we're trying to find an equivalence for.  In this case, the low
> bits of 0.5 are equal to zero, so find_equiv_regs determines that r27 is
> a suitable equivalence.  After register allocation, then, we have:
>
> (insn 323 531 530 6 pr44606.c:34 (set (reg:DF 27 27 [750])
>         (mem/u/c/i:DF (reg/f:SI 11 11 [719]) [5 S8 A64])) 1503
> {*movdf_e500_double} (expr_list:REG_EQUIV (const_double:DF 5.0e-1
> [0x0.8p+0]) (nil)))
>
> ...
>
> (insn 63 547 598 6 ../pr44606.c:20 (set (reg:SI 27 27 [orig:750+4 ] [750])
>         (const_int 0 [0x0])) 349 {*movsi_internal1} (expr_list:REG_EQUAL
> (const_int 0 [0x0]) (nil)))
>
> and a later DCE determines that insn 323 is no longer needed, deleting
> it and resulting in incorrect code at runtime.

What's incorrect exactly?

> Bootstrapping in progress on x86-64.  OK to commit and backport to 4.5
> and 4.4?
>
> -Nathan
>
> gcc/
> 	* reload1.c (emit_reload_insns): Adjust prototype.  Check for
> 	inherited output reloads.

The return value must be documented.

> 	(reload_as_needed): Delete insn if emit_reload_insns returns
> 	true.
> 	(choose_reload_regs): Save equiv in reload_override_in for
> 	output reloads.  Set reg_rtx from reload_override_in.

I think that deleting new insns in reload is too risky on the branches.  Can't 
we tighten the above condition instead on the branches so that it will return 
false in this case?
Andrew Pinski - Sept. 30, 2010, 10:23 p.m.
On Thu, Sep 30, 2010 at 3:16 PM, Eric Botcazou <ebotcazou@adacore.com> wrote:
>> (insn 323 531 530 6 pr44606.c:34 (set (reg:DF 27 27 [750])
>>         (mem/u/c/i:DF (reg/f:SI 11 11 [719]) [5 S8 A64])) 1503
>> {*movdf_e500_double} (expr_list:REG_EQUIV (const_double:DF 5.0e-1
>> [0x0.8p+0]) (nil)))
>>
>> ...
>>
>> (insn 63 547 598 6 ../pr44606.c:20 (set (reg:SI 27 27 [orig:750+4 ] [750])
>>         (const_int 0 [0x0])) 349 {*movsi_internal1} (expr_list:REG_EQUAL
>> (const_int 0 [0x0]) (nil)))
>>
>> and a later DCE determines that insn 323 is no longer needed, deleting
>> it and resulting in incorrect code at runtime.
>
> What's incorrect exactly?

The upper part of (reg:DF 27 27 [750]) would not contain the correct
value between the setting of the register to 0.5 and setting the lower
(SI) part to 0.  I think he forgot to mention (reg:DF 27 27 [750]) is
used between those two instructions.

Thanks,
Andrew Pinski
Bernd Schmidt - Sept. 30, 2010, 10:35 p.m.
On 10/01/2010 12:16 AM, Eric Botcazou wrote:
>> After register allocation, then, we have:
>>
>> (insn 323 531 530 6 pr44606.c:34 (set (reg:DF 27 27 [750])
>>         (mem/u/c/i:DF (reg/f:SI 11 11 [719]) [5 S8 A64])) 1503
>> {*movdf_e500_double} (expr_list:REG_EQUIV (const_double:DF 5.0e-1
>> [0x0.8p+0]) (nil)))
>>
>> ...
>>
>> (insn 63 547 598 6 ../pr44606.c:20 (set (reg:SI 27 27 [orig:750+4 ] [750])
>>         (const_int 0 [0x0])) 349 {*movsi_internal1} (expr_list:REG_EQUAL
>> (const_int 0 [0x0]) (nil)))
>>
>> and a later DCE determines that insn 323 is no longer needed, deleting
>> it and resulting in incorrect code at runtime.
> 
> What's incorrect exactly?

Rather than reusing the portion of reg:DF 27 which holds SImode zero, it
clobbered all of reg:DF 27, which is live past insn 63.  This piece of code:
	  /* If this is an output reload from a simple move insn, look
	     if an equivalence for the input is available.  */
	  else if (inheritance && rld[r].in == 0 && rld[r].out != 0)
	    {
	      rtx set = single_set (insn);

	      if (set
		  && rtx_equal_p (rld[r].out, SET_DEST (set))
		  && CONSTANT_P (SET_SRC (set)))
		search_equiv = SET_SRC (set);
	    }

made us look for the equivalence.  If we find one, and use it, the
original insn (#63 in this case) should be deleted, since it would just
load up the same value we already found (and in this case, clobber other
parts of the register).


Bernd
Nathan Froyd - Sept. 30, 2010, 11:24 p.m.
On Fri, Oct 01, 2010 at 12:35:16AM +0200, Bernd Schmidt wrote:
> On 10/01/2010 12:16 AM, Eric Botcazou wrote:
> >> After register allocation, then, we have:
> >>
> >> (insn 323 531 530 6 pr44606.c:34 (set (reg:DF 27 27 [750])
> >>         (mem/u/c/i:DF (reg/f:SI 11 11 [719]) [5 S8 A64])) 1503
> >> {*movdf_e500_double} (expr_list:REG_EQUIV (const_double:DF 5.0e-1
> >> [0x0.8p+0]) (nil)))
> >>
> >> ...
> >>
> >> (insn 63 547 598 6 ../pr44606.c:20 (set (reg:SI 27 27 [orig:750+4 ] [750])
> >>         (const_int 0 [0x0])) 349 {*movsi_internal1} (expr_list:REG_EQUAL
> >> (const_int 0 [0x0]) (nil)))
> >>
> >> and a later DCE determines that insn 323 is no longer needed, deleting
> >> it and resulting in incorrect code at runtime.
> > 
> > What's incorrect exactly?
> 
> Rather than reusing the portion of reg:DF 27 which holds SImode zero, it
> clobbered all of reg:DF 27, which is live past insn 63.  This piece of code:
> 	  /* If this is an output reload from a simple move insn, look
> 	     if an equivalence for the input is available.  */
> 	  else if (inheritance && rld[r].in == 0 && rld[r].out != 0)
> 	    {
> 	      rtx set = single_set (insn);
> 
> 	      if (set
> 		  && rtx_equal_p (rld[r].out, SET_DEST (set))
> 		  && CONSTANT_P (SET_SRC (set)))
> 		search_equiv = SET_SRC (set);
> 	    }
> 
> made us look for the equivalence.  If we find one, and use it, the
> original insn (#63 in this case) should be deleted, since it would just
> load up the same value we already found (and in this case, clobber other
> parts of the register).

Technically, it's OK that reload did this.  Regular 32-bit PPC insns do
not affect the high bits of the register on E500.  If we stopped after
register allocation, we'd be OK.  The problems arise later, when
dead code elimination sees:

  r27:DF = [stuff] ; load 0.5 from memory
  ...              ; no intervening use of r27
  r27:SI = 0       ; DF thinks the original load is dead here, and
                   ; deletes it, which means later code is finding
                   ; and using garbage in the high bits of r27.

-Nathan
Andrew Pinski - Sept. 30, 2010, 11:29 p.m.
On Thu, Sep 30, 2010 at 4:24 PM, Nathan Froyd <froydnj@codesourcery.com> wrote:
> Technically, it's OK that reload did this.  Regular 32-bit PPC insns do
> not affect the high bits of the register on E500.  If we stopped after
> register allocation, we'd be OK.  The problems arise later, when
> dead code elimination sees:
>
>  r27:DF = [stuff] ; load 0.5 from memory
>  ...              ; no intervening use of r27
>  r27:SI = 0       ; DF thinks the original load is dead here, and
>                   ; deletes it, which means later code is finding
>                   ; and using garbage in the high bits of r27.

Then this sounds like a bug in DF/DCE rather than reload.

-- Pinski
Bernd Schmidt - Sept. 30, 2010, 11:39 p.m.
On 10/01/2010 01:29 AM, Andrew Pinski wrote:
> On Thu, Sep 30, 2010 at 4:24 PM, Nathan Froyd <froydnj@codesourcery.com> wrote:
>> Technically, it's OK that reload did this.  Regular 32-bit PPC insns do
>> not affect the high bits of the register on E500.  If we stopped after
>> register allocation, we'd be OK.  The problems arise later, when
>> dead code elimination sees:
>>
>>  r27:DF = [stuff] ; load 0.5 from memory
>>  ...              ; no intervening use of r27
>>  r27:SI = 0       ; DF thinks the original load is dead here, and
>>                   ; deletes it, which means later code is finding
>>                   ; and using garbage in the high bits of r27.
> 
> Then this sounds like a bug in DF/DCE rather than reload.

A normal set is always considered to clobber the entire register.


Bernd
Eric Botcazou - Oct. 1, 2010, 8:13 a.m.
> > What's incorrect exactly?
>
> The upper part of (reg:DF 27 27 [750]) would not contain the correct
> value between the setting of the register to 0.5 and setting the lower
> (SI) part to 0.  I think he forgot to mention (reg:DF 27 27 [750]) is
> used between those two instructions.

Between the instructions?  That would be a big DCE bug then.  Is that not 
rather after both instructions, in which case patching reload would indeed 
seem to be the correct thing to do.
Nathan Froyd - Oct. 1, 2010, 4:33 p.m.
On Fri, Oct 01, 2010 at 12:16:50AM +0200, Eric Botcazou wrote:
> > 	(reload_as_needed): Delete insn if emit_reload_insns returns
> > 	true.
> > 	(choose_reload_regs): Save equiv in reload_override_in for
> > 	output reloads.  Set reg_rtx from reload_override_in.
> 
> I think that deleting new insns in reload is too risky on the branches.  Can't 
> we tighten the above condition instead on the branches so that it will return 
> false in this case?

I don't see a good way to tighten the above condition.  Deleting the
cleverness is an option, though. ;)

The other possibility is to teach postreload about this sort of thing
and have it delete the second insn.

-Nathan
Eric Botcazou - Oct. 1, 2010, 9:38 p.m.
> I don't see a good way to tighten the above condition.  Deleting the
> cleverness is an option, though. ;)

How about adding a check that the modes have the same size when this is for an 
output reload?
Bernd Schmidt - Oct. 1, 2010, 11:16 p.m.
On 10/01/2010 11:38 PM, Eric Botcazou wrote:
>> I don't see a good way to tighten the above condition.  Deleting the
>> cleverness is an option, though. ;)
> 
> How about adding a check that the modes have the same size when this is for an 
> output reload?

That's just papering over the real problem. I think the intent is
clearly that if we find an equivalence, the move insn being reloaded is
removed (as it serves no purpose if we already have the value in a
register) and only the output reload remains.
For the release branches it's conceivable to disable this optimization,
it shouldn't trigger that often anyway.


Bernd
Bernd Schmidt - Oct. 2, 2010, 12:45 a.m.
On 10/02/2010 01:16 AM, Bernd Schmidt wrote:
> For the release branches it's conceivable to disable this optimization,
> it shouldn't trigger that often anyway.

Actually, maybe it's best to just do that everywhere.  It doesn't seem
to trigger in an i686-linux bootstrap, and it's something we can clean
up in postreload if we really want to.


Bernd
Nathan Froyd - Oct. 2, 2010, 12:50 a.m.
On Sat, Oct 02, 2010 at 02:45:08AM +0200, Bernd Schmidt wrote:
> On 10/02/2010 01:16 AM, Bernd Schmidt wrote:
> > For the release branches it's conceivable to disable this optimization,
> > it shouldn't trigger that often anyway.
> 
> Actually, maybe it's best to just do that everywhere.  It doesn't seem
> to trigger in an i686-linux bootstrap, and it's something we can clean
> up in postreload if we really want to.

I'd be really surprised if it triggered in a bootstrap anywhere, given
that GCC's floating-point usage is practically zero.

I honestly can't think of many places it'd help: soft-float targets and
slightly odd targets like E500 are the only ones that come to mind.  Any
target that permits SImode subregs of DFmode registers would benefit, I
suppose, but even then the stars have to align...

-Nathan
Bernd Schmidt - Oct. 2, 2010, 12:55 a.m.
On 10/02/2010 02:50 AM, Nathan Froyd wrote:

> I'd be really surprised if it triggered in a bootstrap anywhere, given
> that GCC's floating-point usage is practically zero.
> 
> I honestly can't think of many places it'd help: soft-float targets and
> slightly odd targets like E500 are the only ones that come to mind.  Any
> target that permits SImode subregs of DFmode registers would benefit, I
> suppose, but even then the stars have to align...

I mean not just the specific problem that was responsible for the bug,
but the entire case where we try to find an equivalence for an output
reload and actually keep it.  I put an additional abort in the patch and
successfully bootstrapped it.


Bernd
Eric Botcazou - Oct. 4, 2010, 10:39 a.m.
> That's just papering over the real problem. I think the intent is
> clearly that if we find an equivalence, the move insn being reloaded is
> removed (as it serves no purpose if we already have the value in a
> register) and only the output reload remains.
> For the release branches it's conceivable to disable this optimization,
> it shouldn't trigger that often anyway.

As already mentioned, my proposal was for the release branches only.

Patch

Index: testsuite/gcc.dg/pr44606.c
===================================================================
--- testsuite/gcc.dg/pr44606.c	(revision 0)
+++ testsuite/gcc.dg/pr44606.c	(revision 0)
@@ -0,0 +1,53 @@ 
+/* PR target/44606 */
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+typedef struct file FILE;
+extern FILE *stderr;
+extern int fprintf (FILE *, const char *, ...);
+extern void abort (void);
+
+ typedef struct _PixelPacket { 	unsigned char r, g, b; }
+ PixelPacket;
+#define ARRAYLEN(X) (sizeof(X)/sizeof(X[0]))
+PixelPacket q[6];
+#define COLS (ARRAYLEN(q) - 1)
+PixelPacket p[2*COLS + 22];
+#define Minify(POS, WEIGHT) do {	\
+	total_r += (WEIGHT)*(p[POS].r);	\
+	total_g += (WEIGHT)*(p[POS].g);	\
+	total_b += (WEIGHT)*(p[POS].b);	\
+} while (0)
+unsigned long columns = COLS;
+int main(void)
+{
+	static const unsigned char answers[COLS] = { 31, 32, 34, 35, 36 };
+	unsigned long x;
+	for (x = 0; x < sizeof(p)/sizeof(p[0]); x++) {
+		p[x].b = (x + 34) | 1;
+	}
+	for (x = 0; x < columns; x++) {
+		double total_r = 0, total_g = 0, total_b = 0;
+		double saved_r = 0, saved_g = 0, saved_b = 0;
+		Minify(2*x +  0,  3.0);
+		Minify(2*x +  1,  7.0);
+		Minify(2*x +  2,  7.0);
+		saved_r = total_r;
+		saved_g = total_g;
+		Minify(2*x + 11, 15.0);
+		Minify(2*x + 12,  7.0);
+		Minify(2*x + 18,  7.0);
+		Minify(2*x + 19, 15.0);
+		Minify(2*x + 20, 15.0);
+		Minify(2*x + 21,  7.0);
+		q[x].r = (unsigned char)(total_r/128.0 + 0.5);
+		q[x].g = (unsigned char)(total_g/128.0 + 0.5);
+		q[x].b = (unsigned char)(total_b/128.0 + 0.5);
+		fprintf(stderr, "r:%f g:%f b:%f\n", saved_r, saved_g, saved_b);
+	}
+	for (x = 0; x < COLS; x++) {
+		if (answers[x] != q[x].b)
+			abort();
+	}
+	return 0;
+}
Index: testsuite/ChangeLog
===================================================================
Index: reload1.c
===================================================================
--- reload1.c	(revision 164751)
+++ reload1.c	(working copy)
@@ -441,7 +441,7 @@  static void emit_output_reload_insns (st
 				      int);
 static void do_input_reload (struct insn_chain *, struct reload *, int);
 static void do_output_reload (struct insn_chain *, struct reload *, int);
-static void emit_reload_insns (struct insn_chain *);
+static bool emit_reload_insns (struct insn_chain *);
 static void delete_output_reload (rtx, int, int, rtx);
 static void delete_address_reloads (rtx, rtx);
 static void delete_address_reloads_1 (rtx, rtx, rtx);
@@ -4590,6 +4590,7 @@  reload_as_needed (int live_known)
 	    {
 	      rtx next = NEXT_INSN (insn);
 	      rtx p;
+	      bool delete_this;
 
 	      prev = PREV_INSN (insn);
 
@@ -4601,7 +4602,7 @@  reload_as_needed (int live_known)
 
 	      /* Generate the insns to reload operands into or out of
 		 their reload regs.  */
-	      emit_reload_insns (chain);
+	      delete_this = emit_reload_insns (chain);
 
 	      /* Substitute the chosen reload regs from reload_reg_rtx
 		 into the insn's body (or perhaps into the bodies of other
@@ -4628,6 +4629,10 @@  reload_as_needed (int live_known)
 				     "impossible reload");
 		      delete_insn (p);
 		    }
+	      /* emit_reload_insns may have indicated this insn is
+		 unnecessary due to inherited output reloads.  */
+	      if (delete_this)
+		delete_insn (insn);
 	    }
 
 	  if (num_eliminable && chain->need_elim)
@@ -5695,8 +5700,9 @@  static char reload_inherited[MAX_RELOADS
    if we know it.  Otherwise, this is 0.  */
 static rtx reload_inheritance_insn[MAX_RELOADS];
 
-/* If nonzero, this is a place to get the value of the reload,
-   rather than using reload_in.  */
+/* If nonzero, this is a place to get the value of the reload, rather
+   than using reload_in.  For output reloads, this holds the equivalent
+   register if the reload is inherited.  */
 static rtx reload_override_in[MAX_RELOADS];
 
 /* For each reload, the hard register number of the register used,
@@ -6741,7 +6747,10 @@  choose_reload_regs (struct insn_chain *c
 		{
 		  int nr = hard_regno_nregs[regno][rld[r].mode];
 		  int k;
-		  rld[r].reg_rtx = equiv;
+		  if (rld[r].out && !rld[r].in)
+		    reload_override_in[r] = equiv;
+		  else
+		    rld[r].reg_rtx = equiv;
 		  reload_spill_index[r] = regno;
 		  reload_inherited[r] = 1;
 
@@ -6916,7 +6925,12 @@  choose_reload_regs (struct insn_chain *c
      actually override reload_in.  */
   for (j = 0; j < n_reloads; j++)
     if (reload_override_in[j])
-      rld[j].in = reload_override_in[j];
+      {
+	if (rld[j].in)
+	  rld[j].in = reload_override_in[j];
+	else
+	  rld[j].reg_rtx = reload_override_in[j];
+      }
 
   /* If this reload won't be done because it has been canceled or is
      optional and not inherited, clear reload_reg_rtx so other
@@ -7947,7 +7961,7 @@  inherit_piecemeal_p (int dest ATTRIBUTE_
 
 /* Output insns to reload values in and out of the chosen reload regs.  */
 
-static void
+static bool
 emit_reload_insns (struct insn_chain *chain)
 {
   rtx insn = chain->insn;
@@ -8370,6 +8384,10 @@  emit_reload_insns (struct insn_chain *ch
 	}
     }
   IOR_HARD_REG_SET (reg_reloaded_dead, reg_reloaded_died);
+  for (j = 0; j < n_reloads; j++)
+    if (!rld[j].in && rld[j].out && reload_override_in[j])
+      return true;
+  return false;
 }
 
 /* Go through the motions to emit INSN and test if it is strictly valid.