diff mbox

RFA: Simplifying truncation and integer lowpart subregs

Message ID 87k3v3oq72.fsf_-_@talisman.home
State New
Headers show

Commit Message

Richard Sandiford Oct. 6, 2012, 10:22 a.m. UTC
[cc:ing sh, spu and tilegx maintainers]

Richard Sandiford <rdsandiford@googlemail.com> writes:
> Andrew Pinski <andrew.pinski@caviumnetworks.com> writes:
>> On Thu, Sep 27, 2012 at 11:13 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>> 2012-09-27  Uros Bizjak  <ubizjak@gmail.com>
>>>
>>>         PR rtl-optimization/54457
>>>         * simplify-rtx.c (simplify_subreg):
>>>         Simplify (subreg:M (op:N ((x:N) (y:N)), 0)
>>>         to (op:M (subreg:M (x:N) 0) (subreg:M (x:N) 0)), where
>>>         the outer subreg is effectively a truncation to the original mode M.
>>
>>
>> When I was doing something similar on our internal toolchain at
>> Cavium.  I found doing this caused a regression on MIPS64 n32 in
>> gcc.c-torture/execute/20040709-1.c Where:
>>
>>
>> (insn 15 14 16 2 (set (reg/v:DI 200 [ y ])
>>         (reg:DI 2 $2)) t.c:16 301 {*movdi_64bit}
>>      (expr_list:REG_DEAD (reg:DI 2 $2)
>>         (nil)))
>>
>> (insn 16 15 17 2 (set (reg:DI 210)
>>         (zero_extract:DI (reg/v:DI 200 [ y ])
>>             (const_int 29 [0x1d])
>>             (const_int 0 [0]))) t.c:16 249 {extzvdi}
>>      (expr_list:REG_DEAD (reg/v:DI 200 [ y ])
>>         (nil)))
>>
>> (insn 17 16 23 2 (set (reg:SI 211)
>>         (truncate:SI (reg:DI 210))) t.c:16 175 {truncdisi2}
>>      (expr_list:REG_DEAD (reg:DI 210)
>>         (nil)))
>>
>> Gets converted to:
>> (insn 23 17 26 2 (set (reg/i:SI 2 $2)
>>         (and:SI (reg:SI 2 $2 [+4 ])
>>             (const_int 536870911 [0x1fffffff]))) t.c:18 156 {*andsi3}
>>      (nil))
>>
>> Which is considered an ext instruction
>>
>> And with the Octeon simulator which causes undefined arguments to
>> 32bit word operations to come out as 0xDEADBEEF which showed the
>> regression.  I fixed it by changing it to produce TRUNCATE instead of
>> the subreg.
>>
>> I did the simplification on ior/and rather than plus/minus/mult so the
>> issue is only when expanding to this to and/ior.
>
> Hmm, hadn't thought of that.  I think some of the existing subreg
> optimisations suffer the same problem.  I.e. we can't assume that
> subreg truncations of nested operands are OK just because the outer
> subreg is OK.
>
> I've got a patch I'm testing.

The idea is to split most of the lowpart subreg handling out of
simplify_subreg and apply it to TRUNCATE too.  There are three reasons:

- I wanted to make the !TRULY_NOOP_TRUNCATION truncation simplifications
  as similar to subreg truncation simplifications as possible.

- Some of the current lowpart subreg simplifications are also correct
  for vector truncations.

- Ideally, using simplify_gen_unary (TRUNCATE, ...) instead of
  simplify_gen_subreg shouldn't penalise TRULY_NOOP_TRUNCATION targets.
  There is already code to use gen_lowpart_no_emit for truncations that
  reduce to subregs, but as things stand, gen_lowpart_no_emit only
  passes objects like SUBREG, REG, MEM, etc., to simplify_gen_subreg;
  others go through gen_lowpart_SUBREG and get no recursive simplification.

We inherited this code from a 1996 patch (r13058):

      if ((TRULY_NOOP_TRUNCATION_MODES_P (mode, GET_MODE (op))
	   ? (num_sign_bit_copies (op, GET_MODE (op))
	      > (unsigned int) (GET_MODE_PRECISION (GET_MODE (op))
				- GET_MODE_PRECISION (mode)))
        ...
 	return rtl_hooks.gen_lowpart_no_emit (mode, op);

I don't see any reason for the sign-bit check.  If truncations are noops,
we should be able to use a subreg regardless.

Other than removing that check, the patch just moves simplifications around.
I've not tried to match new patterns.

The other !TRULY_NOOP_TRUNCATION targets are sh64, spu and tilegx.
I don't think sh64 has any patterns that would be adversely affected,
although the patch ought to make these patterns redundant:

(define_insn_and_split "*logical_sidisi3"
  [(set (match_operand:SI 0 "arith_reg_dest" "=r,r")
	(truncate:SI (sign_extend:DI
			(match_operator:SI 3 "logical_operator"
			  [(match_operand:SI 1 "arith_reg_operand" "%r,r")
			   (match_operand:SI 2 "logical_operand" "r,I10")]))))]
  "TARGET_SHMEDIA"
  "#"
  "&& 1"
  [(set (match_dup 0) (match_dup 3))])

(define_insn_and_split "*logical_sidi3_2"
  [(set (match_operand:DI 0 "arith_reg_dest" "=r,r")
	(sign_extend:DI (truncate:SI (sign_extend:DI
			(match_operator:SI 3 "logical_operator"
			  [(match_operand:SI 1 "arith_reg_operand" "%r,r")
			   (match_operand:SI 2 "logical_operand" "r,I10")])))))]
  "TARGET_SHMEDIA"
  "#"
  "&& 1"
  [(set (match_dup 0) (sign_extend:DI (match_dup 3)))])

combine should now simplify the first to the normal SI logical op
and the second to *logical_sidisi3.  I don't think any spu or tilegx
patterns are affected either way.

Tested on x86_64-linux-gnu, mipsisa32-elf and mipsisa64-elf.  Also tested
by making sure that there were no code differences for a set of gcc .ii
files on gcc20 (-O2 -march=native).  OK to install?

Richard


gcc/
	* machmode.h (GET_MODE_UNIT_PRECISION): New macro.
	* simplify-rtx.c (simplify_truncation): New function.
	(simplify_unary_operation_1): Use it.  Remove sign bit test
	for !TRULY_NOOP_TRUNCATION_MODES_P.
	(simplify_subreg): Use simplify_int_lowpart for TRUNCATE.
	* config/mips/mips.c (mips_truncated_op_cost): New function.
	(mips_rtx_costs): Adjust test for BADDU.
	* config/mips/mips.md (*baddu_di<mode>): Push truncates to operands.

Comments

Eric Botcazou Oct. 6, 2012, 11:11 a.m. UTC | #1
> Tested on x86_64-linux-gnu, mipsisa32-elf and mipsisa64-elf.  Also tested
> by making sure that there were no code differences for a set of gcc .ii
> files on gcc20 (-O2 -march=native).  OK to install?

Are you sure that generating TRUNCATEs out of nowhere in simplify_subreg is 
always correct?

> gcc/
> 	* machmode.h (GET_MODE_UNIT_PRECISION): New macro.
> 	* simplify-rtx.c (simplify_truncation): New function.

You should say where it comes from.

> 	(simplify_unary_operation_1): Use it.  Remove sign bit test
> 	for !TRULY_NOOP_TRUNCATION_MODES_P.

	(simplify_unary_operation_1) <TRUNCATE>: ...

> 	(simplify_subreg): Use simplify_int_lowpart for TRUNCATE.

This function doesn't exist.  And this is misleading, it's not just for 
TRUNCATE, it's for a truncation to the lowpart.

>  /* Try to simplify a unary operation CODE whose output mode is to be
>     MODE with input operand OP whose mode was originally OP_MODE.
>     Return zero if no simplification can be made.  */
> @@ -689,12 +850,6 @@ simplify_unary_operation_1 (enum rtx_cod
>  	    op_mode = mode;
>  	  in2 = simplify_gen_unary (NOT, op_mode, in2, op_mode);
> 
> -	  if (GET_CODE (in2) == NOT && GET_CODE (in1) != NOT)
> -	    {
> -	      rtx tem = in2;
> -	      in2 = in1; in1 = tem;
> -	    }
> -
>  	  return gen_rtx_fmt_ee (GET_CODE (op) == IOR ? AND : IOR,
>  				 mode, in1, in2);
>  	}

Why is this hunk here?

> @@ -5595,14 +5730,6 @@ simplify_subreg (enum machine_mode outer
>        return NULL_RTX;
>      }
> 
> -  /* Merge implicit and explicit truncations.  */
> -
> -  if (GET_CODE (op) == TRUNCATE
> -      && GET_MODE_SIZE (outermode) < GET_MODE_SIZE (innermode)
> -      && subreg_lowpart_offset (outermode, innermode) == byte)
> -    return simplify_gen_unary (TRUNCATE, outermode, XEXP (op, 0),
> -			       GET_MODE (XEXP (op, 0)));
> -
>    /* SUBREG of a hard register => just change the register number
>       and/or mode.  If the hard register is not valid in that mode,
>       suppress this simplification.  If the hard register is the stack,

Likewise.
Richard Sandiford Oct. 6, 2012, 12:39 p.m. UTC | #2
Thanks for the review.

Eric Botcazou <ebotcazou@adacore.com> writes:
>> Tested on x86_64-linux-gnu, mipsisa32-elf and mipsisa64-elf.  Also tested
>> by making sure that there were no code differences for a set of gcc .ii
>> files on gcc20 (-O2 -march=native).  OK to install?
>
> Are you sure that generating TRUNCATEs out of nowhere in simplify_subreg is 
> always correct?

We only use TRUNCATEs when truncating the operands of an existing operation.
We don't convert the SUBREG itself to a TRUNCATE.  I don't think it matters
whether the narrowing of the operation was brought about by a TRUNCATE
or by a SUBREG.

I think modelling it as a TRUNCATE operation is correct for
!TRULY_NOOP_TRUNCATION (it's the bug that Andrew pointed out).
And we shouldn't generate an actual TRUNCATE rtx for
TRULY_NOOP_TRUNCATION (the thing about making
simplify_gen_unary (TRUNCATE, ...) no worse than simplify_gen_subreg
for those targets).  I suppose:

      /* We can't handle truncation to a partial integer mode here
         because we don't know the real bitsize of the partial
         integer mode.  */
      if (GET_MODE_CLASS (mode) == MODE_PARTIAL_INT)
        break;

might be a problem though; we should still allow a subreg to be
generated.  Is that what you were thinking of, or something else?

I'm certainly open to other ideas though.  Or do you think that I was
wrong about doing this narrowing in simplify_subreg to begin with?

>> gcc/
>> 	* machmode.h (GET_MODE_UNIT_PRECISION): New macro.
>> 	* simplify-rtx.c (simplify_truncation): New function.
>
> You should say where it comes from.

OK.

>> 	(simplify_unary_operation_1): Use it.  Remove sign bit test
>> 	for !TRULY_NOOP_TRUNCATION_MODES_P.
>
> 	(simplify_unary_operation_1) <TRUNCATE>: ...
>
>> 	(simplify_subreg): Use simplify_int_lowpart for TRUNCATE.
>
> This function doesn't exist.  And this is misleading, it's not just for 
> TRUNCATE, it's for a truncation to the lowpart.

Curses, forgot to update that part.  Thanks.

>>  /* Try to simplify a unary operation CODE whose output mode is to be
>>     MODE with input operand OP whose mode was originally OP_MODE.
>>     Return zero if no simplification can be made.  */
>> @@ -689,12 +850,6 @@ simplify_unary_operation_1 (enum rtx_cod
>>  	    op_mode = mode;
>>  	  in2 = simplify_gen_unary (NOT, op_mode, in2, op_mode);
>> 
>> -	  if (GET_CODE (in2) == NOT && GET_CODE (in1) != NOT)
>> -	    {
>> -	      rtx tem = in2;
>> -	      in2 = in1; in1 = tem;
>> -	    }
>> -
>>  	  return gen_rtx_fmt_ee (GET_CODE (op) == IOR ? AND : IOR,
>>  				 mode, in1, in2);
>>  	}
>
> Why is this hunk here?

Sorry, I've no idea :-(

>> @@ -5595,14 +5730,6 @@ simplify_subreg (enum machine_mode outer
>>        return NULL_RTX;
>>      }
>> 
>> -  /* Merge implicit and explicit truncations.  */
>> -
>> -  if (GET_CODE (op) == TRUNCATE
>> -      && GET_MODE_SIZE (outermode) < GET_MODE_SIZE (innermode)
>> -      && subreg_lowpart_offset (outermode, innermode) == byte)
>> -    return simplify_gen_unary (TRUNCATE, outermode, XEXP (op, 0),
>> -			       GET_MODE (XEXP (op, 0)));
>> -
>>    /* SUBREG of a hard register => just change the register number
>>       and/or mode.  If the hard register is not valid in that mode,
>>       suppress this simplification.  If the hard register is the stack,
>
> Likewise.

This moved to simplify_truncation:

  /* (truncate:A (truncate:B X)) is (truncate:A X).  */
  if (GET_CODE (op) == TRUNCATE)
    return simplify_gen_unary (TRUNCATE, mode, XEXP (op, 0),
			       GET_MODE (XEXP (op, 0)));

I haven't included an updated patch because of your general concern above.

Richard
Eric Botcazou Oct. 6, 2012, 1:03 p.m. UTC | #3
> I think modelling it as a TRUNCATE operation is correct for
> !TRULY_NOOP_TRUNCATION (it's the bug that Andrew pointed out).
> And we shouldn't generate an actual TRUNCATE rtx for
> TRULY_NOOP_TRUNCATION (the thing about making
> simplify_gen_unary (TRUNCATE, ...) no worse than simplify_gen_subreg
> for those targets).  I suppose:
> 
>       /* We can't handle truncation to a partial integer mode here
>          because we don't know the real bitsize of the partial
>          integer mode.  */
>       if (GET_MODE_CLASS (mode) == MODE_PARTIAL_INT)
>         break;
> 
> might be a problem though; we should still allow a subreg to be
> generated.  Is that what you were thinking of, or something else?

I was thinking of the !TRULY_NOOP_TRUNCATION case, where the two operations 
aren't equivalent.  Generating TRUNCATE in simplify_subreg seems suspicious to 
me in this case but, if not doing it is the source of the bug, I guess I need 
to do some homework on this TRULY_NOOP_TRUNCATION stuff. :-)

Maybe add a blurb to the head comment of simplify_truncation, explaining that
it is valid to call the function both for TRUNCATEs and truncations to the 
lowpart, and why it is correct to generate new TRUNCATEs in the latter case.
diff mbox

Patch

Index: gcc/machmode.h
===================================================================
--- gcc/machmode.h	2012-06-23 08:30:36.000000000 +0100
+++ gcc/machmode.h	2012-10-06 10:03:47.146873855 +0100
@@ -217,6 +217,11 @@  #define GET_MODE_UNIT_SIZE(MODE)		\
 #define GET_MODE_UNIT_BITSIZE(MODE) \
   ((unsigned short) (GET_MODE_UNIT_SIZE (MODE) * BITS_PER_UNIT))
 
+#define GET_MODE_UNIT_PRECISION(MODE)		\
+  (GET_MODE_INNER (MODE) == VOIDmode		\
+   ? GET_MODE_PRECISION (MODE)			\
+   : GET_MODE_PRECISION (GET_MODE_INNER (MODE)))
+
 /* Get the number of units in the object.  */
 
 extern const unsigned char mode_nunits[NUM_MACHINE_MODES];
Index: gcc/simplify-rtx.c
===================================================================
--- gcc/simplify-rtx.c	2012-10-02 20:34:15.969129966 +0100
+++ gcc/simplify-rtx.c	2012-10-06 10:08:08.349859303 +0100
@@ -564,6 +564,167 @@  simplify_replace_rtx (rtx x, const_rtx o
   return simplify_replace_fn_rtx (x, old_rtx, 0, new_rtx);
 }
 
+/* Try to simplify a MODE truncation of OP, which has OP_MODE.  */
+
+static rtx
+simplify_truncation (enum machine_mode mode, rtx op,
+		     enum machine_mode op_mode)
+{
+  unsigned int precision = GET_MODE_UNIT_PRECISION (mode);
+  unsigned int op_precision = GET_MODE_UNIT_PRECISION (op_mode);
+  gcc_assert (precision <= op_precision);
+
+  /* Optimize truncations of zero and sign extended values.  */
+  if (GET_CODE (op) == ZERO_EXTEND
+      || GET_CODE (op) == SIGN_EXTEND)
+    {
+      /* There are three possibilities.  If MODE is the same as the
+	 origmode, we can omit both the extension and the subreg.
+	 If MODE is not larger than the origmode, we can apply the
+	 truncation without the extension.  Finally, if the outermode
+	 is larger than the origmode, we can just extend to the appropriate
+	 mode.  */
+      enum machine_mode origmode = GET_MODE (XEXP (op, 0));
+      if (mode == origmode)
+	return XEXP (op, 0);
+      else if (precision <= GET_MODE_UNIT_PRECISION (origmode))
+	return simplify_gen_unary (TRUNCATE, mode,
+				   XEXP (op, 0), origmode);
+      else
+	return simplify_gen_unary (GET_CODE (op), mode,
+				   XEXP (op, 0), origmode);
+    }
+
+  /* Simplify (truncate:SI (op:DI (x:DI) (y:DI)))
+     to (op:SI (truncate:SI (x:DI)) (truncate:SI (x:DI))).  */
+  if (GET_CODE (op) == PLUS
+      || GET_CODE (op) == MINUS
+      || GET_CODE (op) == MULT)
+    {
+      rtx op0 = simplify_gen_unary (TRUNCATE, mode, XEXP (op, 0), op_mode);
+      if (op0)
+	{
+	  rtx op1 = simplify_gen_unary (TRUNCATE, mode, XEXP (op, 1), op_mode);
+	  if (op1)
+	    return simplify_gen_binary (GET_CODE (op), mode, op0, op1);
+	}
+    }
+
+  /* Simplify (truncate:QI (lshiftrt:SI (sign_extend:SI (x:QI)) C)) into
+     to (ashiftrt:QI (x:QI) C), where C is a suitable small constant and
+     the outer subreg is effectively a truncation to the original mode.  */
+  if ((GET_CODE (op) == LSHIFTRT
+       || GET_CODE (op) == ASHIFTRT)
+      /* Ensure that OP_MODE is at least twice as wide as MODE
+	 to avoid the possibility that an outer LSHIFTRT shifts by more
+	 than the sign extension's sign_bit_copies and introduces zeros
+	 into the high bits of the result.  */
+      && 2 * precision <= op_precision
+      && CONST_INT_P (XEXP (op, 1))
+      && GET_CODE (XEXP (op, 0)) == SIGN_EXTEND
+      && GET_MODE (XEXP (XEXP (op, 0), 0)) == mode
+      && INTVAL (XEXP (op, 1)) < precision)
+    return simplify_gen_binary (ASHIFTRT, mode,
+				XEXP (XEXP (op, 0), 0), XEXP (op, 1));
+
+  /* Likewise (truncate:QI (lshiftrt:SI (zero_extend:SI (x:QI)) C)) into
+     to (lshiftrt:QI (x:QI) C), where C is a suitable small constant and
+     the outer subreg is effectively a truncation to the original mode.  */
+  if ((GET_CODE (op) == LSHIFTRT
+       || GET_CODE (op) == ASHIFTRT)
+      && CONST_INT_P (XEXP (op, 1))
+      && GET_CODE (XEXP (op, 0)) == ZERO_EXTEND
+      && GET_MODE (XEXP (XEXP (op, 0), 0)) == mode
+      && INTVAL (XEXP (op, 1)) < precision)
+    return simplify_gen_binary (LSHIFTRT, mode,
+				XEXP (XEXP (op, 0), 0), XEXP (op, 1));
+
+  /* Likewise (truncate:QI (ashift:SI (zero_extend:SI (x:QI)) C)) into
+     to (ashift:QI (x:QI) C), where C is a suitable small constant and
+     the outer subreg is effectively a truncation to the original mode.  */
+  if (GET_CODE (op) == ASHIFT
+      && CONST_INT_P (XEXP (op, 1))
+      && (GET_CODE (XEXP (op, 0)) == ZERO_EXTEND
+	  || GET_CODE (XEXP (op, 0)) == SIGN_EXTEND)
+      && GET_MODE (XEXP (XEXP (op, 0), 0)) == mode
+      && INTVAL (XEXP (op, 1)) < precision)
+    return simplify_gen_binary (ASHIFT, mode,
+				XEXP (XEXP (op, 0), 0), XEXP (op, 1));
+
+  /* Recognize a word extraction from a multi-word subreg.  */
+  if ((GET_CODE (op) == LSHIFTRT
+       || GET_CODE (op) == ASHIFTRT)
+      && SCALAR_INT_MODE_P (mode)
+      && SCALAR_INT_MODE_P (op_mode)
+      && precision >= BITS_PER_WORD
+      && 2 * precision <= op_precision
+      && CONST_INT_P (XEXP (op, 1))
+      && (INTVAL (XEXP (op, 1)) & (precision - 1)) == 0
+      && INTVAL (XEXP (op, 1)) >= 0
+      && INTVAL (XEXP (op, 1)) < op_precision)
+    {
+      int byte = subreg_lowpart_offset (mode, op_mode);
+      int shifted_bytes = INTVAL (XEXP (op, 1)) / BITS_PER_UNIT;
+      return simplify_gen_subreg (mode, XEXP (op, 0), op_mode,
+				  (WORDS_BIG_ENDIAN
+				   ? byte - shifted_bytes
+				   : byte + shifted_bytes));
+    }
+
+  /* If we have a TRUNCATE of a right shift of MEM, make a new MEM
+     and try replacing the TRUNCATE and shift with it.  Don't do this
+     if the MEM has a mode-dependent address.  */
+  if ((GET_CODE (op) == LSHIFTRT
+       || GET_CODE (op) == ASHIFTRT)
+      && SCALAR_INT_MODE_P (op_mode)
+      && MEM_P (XEXP (op, 0))
+      && CONST_INT_P (XEXP (op, 1))
+      && (INTVAL (XEXP (op, 1)) % GET_MODE_BITSIZE (mode)) == 0
+      && INTVAL (XEXP (op, 1)) > 0
+      && INTVAL (XEXP (op, 1)) < GET_MODE_BITSIZE (op_mode)
+      && ! mode_dependent_address_p (XEXP (XEXP (op, 0), 0),
+				     MEM_ADDR_SPACE (XEXP (op, 0)))
+      && ! MEM_VOLATILE_P (XEXP (op, 0))
+      && (GET_MODE_SIZE (mode) >= UNITS_PER_WORD
+	  || WORDS_BIG_ENDIAN == BYTES_BIG_ENDIAN))
+    {
+      int byte = subreg_lowpart_offset (mode, op_mode);
+      int shifted_bytes = INTVAL (XEXP (op, 1)) / BITS_PER_UNIT;
+      return adjust_address_nv (XEXP (op, 0), mode,
+				(WORDS_BIG_ENDIAN
+				 ? byte - shifted_bytes
+				 : byte + shifted_bytes));
+    }
+
+  /* (truncate:SI (OP:DI ({sign,zero}_extend:DI foo:SI))) is
+     (OP:SI foo:SI) if OP is NEG or ABS.  */
+  if ((GET_CODE (op) == ABS
+       || GET_CODE (op) == NEG)
+      && (GET_CODE (XEXP (op, 0)) == SIGN_EXTEND
+	  || GET_CODE (XEXP (op, 0)) == ZERO_EXTEND)
+      && GET_MODE (XEXP (XEXP (op, 0), 0)) == mode)
+    return simplify_gen_unary (GET_CODE (op), mode,
+			       XEXP (XEXP (op, 0), 0), mode);
+
+  /* (truncate:A (subreg:B (truncate:C X) 0)) is
+     (truncate:A X).  */
+  if (GET_CODE (op) == SUBREG
+      && SCALAR_INT_MODE_P (mode)
+      && SCALAR_INT_MODE_P (op_mode)
+      && SCALAR_INT_MODE_P (GET_MODE (SUBREG_REG (op)))
+      && GET_CODE (SUBREG_REG (op)) == TRUNCATE
+      && subreg_lowpart_p (op))
+    return simplify_gen_unary (TRUNCATE, mode, XEXP (SUBREG_REG (op), 0),
+			       GET_MODE (XEXP (SUBREG_REG (op), 0)));
+
+  /* (truncate:A (truncate:B X)) is (truncate:A X).  */
+  if (GET_CODE (op) == TRUNCATE)
+    return simplify_gen_unary (TRUNCATE, mode, XEXP (op, 0),
+			       GET_MODE (XEXP (op, 0)));
+
+  return NULL_RTX;
+}
+
 /* Try to simplify a unary operation CODE whose output mode is to be
    MODE with input operand OP whose mode was originally OP_MODE.
    Return zero if no simplification can be made.  */
@@ -689,12 +850,6 @@  simplify_unary_operation_1 (enum rtx_cod
 	    op_mode = mode;
 	  in2 = simplify_gen_unary (NOT, op_mode, in2, op_mode);
 
-	  if (GET_CODE (in2) == NOT && GET_CODE (in1) != NOT)
-	    {
-	      rtx tem = in2;
-	      in2 = in1; in1 = tem;
-	    }
-
 	  return gen_rtx_fmt_ee (GET_CODE (op) == IOR ? AND : IOR,
 				 mode, in1, in2);
 	}
@@ -821,44 +976,24 @@  simplify_unary_operation_1 (enum rtx_cod
       if (GET_MODE_CLASS (mode) == MODE_PARTIAL_INT)
         break;
 
-      /* (truncate:SI ({sign,zero}_extend:DI foo:SI)) == foo:SI.  */
-      if ((GET_CODE (op) == SIGN_EXTEND
-	   || GET_CODE (op) == ZERO_EXTEND)
-	  && GET_MODE (XEXP (op, 0)) == mode)
-	return XEXP (op, 0);
-
-      /* (truncate:SI (OP:DI ({sign,zero}_extend:DI foo:SI))) is
-	 (OP:SI foo:SI) if OP is NEG or ABS.  */
-      if ((GET_CODE (op) == ABS
-	   || GET_CODE (op) == NEG)
-	  && (GET_CODE (XEXP (op, 0)) == SIGN_EXTEND
-	      || GET_CODE (XEXP (op, 0)) == ZERO_EXTEND)
-	  && GET_MODE (XEXP (XEXP (op, 0), 0)) == mode)
-	return simplify_gen_unary (GET_CODE (op), mode,
-				   XEXP (XEXP (op, 0), 0), mode);
+      /* Don't optimize (lshiftrt (mult ...)) as it would interfere
+	 with the umulXi3_highpart patterns.  */
+      if (GET_CODE (op) == LSHIFTRT
+	  && GET_CODE (XEXP (op, 0)) == MULT)
+	break;
 
-      /* (truncate:A (subreg:B (truncate:C X) 0)) is
-	 (truncate:A X).  */
-      if (GET_CODE (op) == SUBREG
-	  && GET_CODE (SUBREG_REG (op)) == TRUNCATE
-	  && subreg_lowpart_p (op))
-	return simplify_gen_unary (TRUNCATE, mode, XEXP (SUBREG_REG (op), 0),
-				   GET_MODE (XEXP (SUBREG_REG (op), 0)));
+      if (GET_MODE (op) != VOIDmode)
+	{
+	  temp = simplify_truncation (mode, op, GET_MODE (op));
+	  if (temp)
+	    return temp;
+	}
 
       /* If we know that the value is already truncated, we can
-         replace the TRUNCATE with a SUBREG.  Note that this is also
-         valid if TRULY_NOOP_TRUNCATION is false for the corresponding
-         modes we just have to apply a different definition for
-         truncation.  But don't do this for an (LSHIFTRT (MULT ...))
-         since this will cause problems with the umulXi3_highpart
-         patterns.  */
-      if ((TRULY_NOOP_TRUNCATION_MODES_P (mode, GET_MODE (op))
-	   ? (num_sign_bit_copies (op, GET_MODE (op))
-	      > (unsigned int) (GET_MODE_PRECISION (GET_MODE (op))
-				- GET_MODE_PRECISION (mode)))
-	   : truncated_to_mode (mode, op))
-	  && ! (GET_CODE (op) == LSHIFTRT
-		&& GET_CODE (XEXP (op, 0)) == MULT))
+	 replace the TRUNCATE with a SUBREG.  */
+      if (GET_MODE_NUNITS (mode) == 1
+	  && (TRULY_NOOP_TRUNCATION_MODES_P (mode, GET_MODE (op))
+	      || truncated_to_mode (mode, op)))
 	return rtl_hooks.gen_lowpart_no_emit (mode, op);
 
       /* A truncate of a comparison can be replaced with a subreg if
@@ -5595,14 +5730,6 @@  simplify_subreg (enum machine_mode outer
       return NULL_RTX;
     }
 
-  /* Merge implicit and explicit truncations.  */
-
-  if (GET_CODE (op) == TRUNCATE
-      && GET_MODE_SIZE (outermode) < GET_MODE_SIZE (innermode)
-      && subreg_lowpart_offset (outermode, innermode) == byte)
-    return simplify_gen_unary (TRUNCATE, outermode, XEXP (op, 0),
-			       GET_MODE (XEXP (op, 0)));
-
   /* SUBREG of a hard register => just change the register number
      and/or mode.  If the hard register is not valid in that mode,
      suppress this simplification.  If the hard register is the stack,
@@ -5688,160 +5815,23 @@  simplify_subreg (enum machine_mode outer
       return NULL_RTX;
     }
 
-  /* Optimize SUBREG truncations of zero and sign extended values.  */
-  if ((GET_CODE (op) == ZERO_EXTEND
-       || GET_CODE (op) == SIGN_EXTEND)
-      && SCALAR_INT_MODE_P (innermode)
-      && GET_MODE_PRECISION (outermode) < GET_MODE_PRECISION (innermode))
+  /* A SUBREG resulting from a zero extension may fold to zero if
+     it extracts higher bits that the ZERO_EXTEND's source bits.  */
+  if (GET_CODE (op) == ZERO_EXTEND)
     {
       unsigned int bitpos = subreg_lsb_1 (outermode, innermode, byte);
-
-      /* If we're requesting the lowpart of a zero or sign extension,
-	 there are three possibilities.  If the outermode is the same
-	 as the origmode, we can omit both the extension and the subreg.
-	 If the outermode is not larger than the origmode, we can apply
-	 the truncation without the extension.  Finally, if the outermode
-	 is larger than the origmode, but both are integer modes, we
-	 can just extend to the appropriate mode.  */
-      if (bitpos == 0)
-	{
-	  enum machine_mode origmode = GET_MODE (XEXP (op, 0));
-	  if (outermode == origmode)
-	    return XEXP (op, 0);
-	  if (GET_MODE_PRECISION (outermode) <= GET_MODE_PRECISION (origmode))
-	    return simplify_gen_subreg (outermode, XEXP (op, 0), origmode,
-					subreg_lowpart_offset (outermode,
-							       origmode));
-	  if (SCALAR_INT_MODE_P (outermode))
-	    return simplify_gen_unary (GET_CODE (op), outermode,
-				       XEXP (op, 0), origmode);
-	}
-
-      /* A SUBREG resulting from a zero extension may fold to zero if
-	 it extracts higher bits that the ZERO_EXTEND's source bits.  */
-      if (GET_CODE (op) == ZERO_EXTEND
-	  && bitpos >= GET_MODE_PRECISION (GET_MODE (XEXP (op, 0))))
+      if (bitpos >= GET_MODE_PRECISION (GET_MODE (XEXP (op, 0))))
 	return CONST0_RTX (outermode);
     }
 
-  /* Simplify (subreg:SI (op:DI ((x:DI) (y:DI)), 0)
-     to (op:SI (subreg:SI (x:DI) 0) (subreg:SI (x:DI) 0)), where
-     the outer subreg is effectively a truncation to the original mode.  */
-  if ((GET_CODE (op) == PLUS
-       || GET_CODE (op) == MINUS
-       || GET_CODE (op) == MULT)
-      && SCALAR_INT_MODE_P (outermode)
+  if (SCALAR_INT_MODE_P (outermode)
       && SCALAR_INT_MODE_P (innermode)
       && GET_MODE_PRECISION (outermode) < GET_MODE_PRECISION (innermode)
       && byte == subreg_lowpart_offset (outermode, innermode))
     {
-      rtx op0 = simplify_gen_subreg (outermode, XEXP (op, 0),
-                                     innermode, byte);
-      if (op0)
-        {
-          rtx op1 = simplify_gen_subreg (outermode, XEXP (op, 1),
-                                         innermode, byte);
-          if (op1)
-            return simplify_gen_binary (GET_CODE (op), outermode, op0, op1);
-        }
-    }
-
-  /* Simplify (subreg:QI (lshiftrt:SI (sign_extend:SI (x:QI)) C), 0) into
-     to (ashiftrt:QI (x:QI) C), where C is a suitable small constant and
-     the outer subreg is effectively a truncation to the original mode.  */
-  if ((GET_CODE (op) == LSHIFTRT
-       || GET_CODE (op) == ASHIFTRT)
-      && SCALAR_INT_MODE_P (outermode)
-      && SCALAR_INT_MODE_P (innermode)
-      /* Ensure that OUTERMODE is at least twice as wide as the INNERMODE
-	 to avoid the possibility that an outer LSHIFTRT shifts by more
-	 than the sign extension's sign_bit_copies and introduces zeros
-	 into the high bits of the result.  */
-      && (2 * GET_MODE_PRECISION (outermode)) <= GET_MODE_PRECISION (innermode)
-      && CONST_INT_P (XEXP (op, 1))
-      && GET_CODE (XEXP (op, 0)) == SIGN_EXTEND
-      && GET_MODE (XEXP (XEXP (op, 0), 0)) == outermode
-      && INTVAL (XEXP (op, 1)) < GET_MODE_PRECISION (outermode)
-      && subreg_lsb_1 (outermode, innermode, byte) == 0)
-    return simplify_gen_binary (ASHIFTRT, outermode,
-				XEXP (XEXP (op, 0), 0), XEXP (op, 1));
-
-  /* Likewise (subreg:QI (lshiftrt:SI (zero_extend:SI (x:QI)) C), 0) into
-     to (lshiftrt:QI (x:QI) C), where C is a suitable small constant and
-     the outer subreg is effectively a truncation to the original mode.  */
-  if ((GET_CODE (op) == LSHIFTRT
-       || GET_CODE (op) == ASHIFTRT)
-      && SCALAR_INT_MODE_P (outermode)
-      && SCALAR_INT_MODE_P (innermode)
-      && GET_MODE_PRECISION (outermode) < GET_MODE_PRECISION (innermode)
-      && CONST_INT_P (XEXP (op, 1))
-      && GET_CODE (XEXP (op, 0)) == ZERO_EXTEND
-      && GET_MODE (XEXP (XEXP (op, 0), 0)) == outermode
-      && INTVAL (XEXP (op, 1)) < GET_MODE_PRECISION (outermode)
-      && subreg_lsb_1 (outermode, innermode, byte) == 0)
-    return simplify_gen_binary (LSHIFTRT, outermode,
-				XEXP (XEXP (op, 0), 0), XEXP (op, 1));
-
-  /* Likewise (subreg:QI (ashift:SI (zero_extend:SI (x:QI)) C), 0) into
-     to (ashift:QI (x:QI) C), where C is a suitable small constant and
-     the outer subreg is effectively a truncation to the original mode.  */
-  if (GET_CODE (op) == ASHIFT
-      && SCALAR_INT_MODE_P (outermode)
-      && SCALAR_INT_MODE_P (innermode)
-      && GET_MODE_PRECISION (outermode) < GET_MODE_PRECISION (innermode)
-      && CONST_INT_P (XEXP (op, 1))
-      && (GET_CODE (XEXP (op, 0)) == ZERO_EXTEND
-	  || GET_CODE (XEXP (op, 0)) == SIGN_EXTEND)
-      && GET_MODE (XEXP (XEXP (op, 0), 0)) == outermode
-      && INTVAL (XEXP (op, 1)) < GET_MODE_PRECISION (outermode)
-      && subreg_lsb_1 (outermode, innermode, byte) == 0)
-    return simplify_gen_binary (ASHIFT, outermode,
-				XEXP (XEXP (op, 0), 0), XEXP (op, 1));
-
-  /* Recognize a word extraction from a multi-word subreg.  */
-  if ((GET_CODE (op) == LSHIFTRT
-       || GET_CODE (op) == ASHIFTRT)
-      && SCALAR_INT_MODE_P (innermode)
-      && GET_MODE_PRECISION (outermode) >= BITS_PER_WORD
-      && GET_MODE_PRECISION (innermode) >= (2 * GET_MODE_PRECISION (outermode))
-      && CONST_INT_P (XEXP (op, 1))
-      && (INTVAL (XEXP (op, 1)) & (GET_MODE_PRECISION (outermode) - 1)) == 0
-      && INTVAL (XEXP (op, 1)) >= 0
-      && INTVAL (XEXP (op, 1)) < GET_MODE_PRECISION (innermode)
-      && byte == subreg_lowpart_offset (outermode, innermode))
-    {
-      int shifted_bytes = INTVAL (XEXP (op, 1)) / BITS_PER_UNIT;
-      return simplify_gen_subreg (outermode, XEXP (op, 0), innermode,
-				  (WORDS_BIG_ENDIAN
-				   ? byte - shifted_bytes
-				   : byte + shifted_bytes));
-    }
-
-  /* If we have a lowpart SUBREG of a right shift of MEM, make a new MEM
-     and try replacing the SUBREG and shift with it.  Don't do this if
-     the MEM has a mode-dependent address or if we would be widening it.  */
-
-  if ((GET_CODE (op) == LSHIFTRT
-       || GET_CODE (op) == ASHIFTRT)
-      && SCALAR_INT_MODE_P (innermode)
-      && MEM_P (XEXP (op, 0))
-      && CONST_INT_P (XEXP (op, 1))
-      && GET_MODE_SIZE (outermode) < GET_MODE_SIZE (GET_MODE (op))
-      && (INTVAL (XEXP (op, 1)) % GET_MODE_BITSIZE (outermode)) == 0
-      && INTVAL (XEXP (op, 1)) > 0
-      && INTVAL (XEXP (op, 1)) < GET_MODE_BITSIZE (innermode)
-      && ! mode_dependent_address_p (XEXP (XEXP (op, 0), 0),
-				     MEM_ADDR_SPACE (XEXP (op, 0)))
-      && ! MEM_VOLATILE_P (XEXP (op, 0))
-      && byte == subreg_lowpart_offset (outermode, innermode)
-      && (GET_MODE_SIZE (outermode) >= UNITS_PER_WORD
-	  || WORDS_BIG_ENDIAN == BYTES_BIG_ENDIAN))
-    {
-      int shifted_bytes = INTVAL (XEXP (op, 1)) / BITS_PER_UNIT;
-      return adjust_address_nv (XEXP (op, 0), outermode,
-				(WORDS_BIG_ENDIAN
-				 ? byte - shifted_bytes
-				 : byte + shifted_bytes));
+      rtx tem = simplify_truncation (outermode, op, innermode);
+      if (tem)
+	return tem;
     }
 
   return NULL_RTX;
Index: gcc/config/mips/mips.c
===================================================================
--- gcc/config/mips/mips.c	2012-10-02 21:02:21.000000000 +0100
+++ gcc/config/mips/mips.c	2012-10-06 10:06:42.617864078 +0100
@@ -3527,6 +3527,17 @@  mips_set_reg_reg_cost (enum machine_mode
     }
 }
 
+/* Return the cost of an operand X that can be trucated for free.
+   SPEED says whether we're optimizing for size or speed.  */
+
+static int
+mips_truncated_op_cost (rtx x, bool speed)
+{
+  if (GET_CODE (x) == TRUNCATE)
+    x = XEXP (x, 0);
+  return set_src_cost (x, speed);
+}
+
 /* Implement TARGET_RTX_COSTS.  */
 
 static bool
@@ -3907,12 +3918,13 @@  mips_rtx_costs (rtx x, int code, int out
     case ZERO_EXTEND:
       if (outer_code == SET
 	  && ISA_HAS_BADDU
-	  && (GET_CODE (XEXP (x, 0)) == TRUNCATE
-	      || GET_CODE (XEXP (x, 0)) == SUBREG)
 	  && GET_MODE (XEXP (x, 0)) == QImode
-	  && GET_CODE (XEXP (XEXP (x, 0), 0)) == PLUS)
+	  && GET_CODE (XEXP (x, 0)) == PLUS)
 	{
-	  *total = set_src_cost (XEXP (XEXP (x, 0), 0), speed);
+	  rtx plus = XEXP (x, 0);
+	  *total = (COSTS_N_INSNS (1)
+		    + mips_truncated_op_cost (XEXP (plus, 0), speed)
+		    + mips_truncated_op_cost (XEXP (plus, 1), speed));
 	  return true;
 	}
       *total = mips_zero_extend_cost (mode, XEXP (x, 0));
Index: gcc/config/mips/mips.md
===================================================================
--- gcc/config/mips/mips.md	2012-10-02 20:37:34.000000000 +0100
+++ gcc/config/mips/mips.md	2012-10-06 10:03:47.156873851 +0100
@@ -1305,9 +1305,8 @@  (define_insn "*baddu_si"
 (define_insn "*baddu_di<mode>"
   [(set (match_operand:GPR 0 "register_operand" "=d")
         (zero_extend:GPR
-	 (truncate:QI
-	  (plus:DI (match_operand:DI 1 "register_operand" "d")
-		   (match_operand:DI 2 "register_operand" "d")))))]
+	 (plus:QI (truncate:QI (match_operand:DI 1 "register_operand" "d"))
+		  (truncate:QI (match_operand:DI 2 "register_operand" "d")))))]
   "ISA_HAS_BADDU && TARGET_64BIT"
   "baddu\\t%0,%1,%2"
   [(set_attr "alu_type" "add")])