diff mbox

PR43902 patch: Widening multiply-accumulate

Message ID 4C214656.4050501@codesourcery.com
State New
Headers show

Commit Message

Bernd Schmidt June 22, 2010, 11:25 p.m. UTC
Here's a patch to fix most of PR43902, which is about missing support
for multiply-accumulate instructions on MIPS.  Jim Wilson did most of
the work on this patch, adding a new optimization in the
optimize_widening_multiply pass; I've slightly modified it to add
support for ternary gimple statements, as well as adding ARM bits.
There's some history and discussion in the PR.

Most passes probably don't need to handle ternary gimple statements
(tree-ssa-math-opts runs quite late), so I've provided some wrappers
around frequently used functions so that passes can for now continue to
use the simpler interface.

I've tried for a while to convert DOT_PROD_EXPR to use this new
infrastructure, but it took my rather far down into the vectorizer and I
gave up.  It's probably something the vectorizer maintainers should look
into.

Bootstrapped and regression tested on i686-linux.  Ok?


Bernd
PR target/43902
	* tree-pretty-print.c (dump_generic_node, op_code_prio): Add
	WIDEN_MULT_PLUS_EXPR and WIDEN_MULT_MINUS_EXPR.
	* optabs.c (optab_for_tree_code): Likewise.
	(expand_widen_pattern_expr): Likewise.
	* tree-ssa-math-opts.c (convert_mult_to_widen): New function, broken
	out of execute_optimize_widening_mul.
	(convert_plusminus_to_widen): New function.
	(execute_optimize_widening_mul): Use the two new functions.
	* expr.c (expand_expr_real_2): Add support for GIMPLE_TERNARY_RHS.
	Remove code to generate widening multiply-accumulate.  Add support
	for WIDEN_MULT_PLUS_EXPR and WIDEN_MULT_MINUS_EXPR.
	* gimple-pretty-print.c (dump_ternary_rhs): New function.
	(dump_gimple_assign): Call it when appropriate.
	* tree.def (WIDEN_MULT_PLUS_EXPR, WIDEN_MULT_MINUS_EXPR): New codes.
	* gimple-fold.c (fold_gimple_assign): Support GIMPLE_TERNARY_RHS.
	* cfgexpand.c (gimple_assign_rhs_to_tree): Likewise.
	(expand_gimple_stmt_1): Likewise.
	(expand_debug_expr): Support WIDEN_MULT_PLUS_EXPR and
	WIDEN_MULT_MINUS_EXPR.
	* tree-ssa-operands.c (get_expr_operands): Likewise.
	* tree-inline.c (estimate_operator_cost): Likewise.
	* gimple.c (extract_ops_from_tree_1): Renamed from
	extract_ops_from_tree.  Add new arg for a third operand; fill it.
	(gimple_build_assign_stat): Support operations with three operands.
	(gimple_build_assign_with_ops_stat): Likewise.
	(gimple_assign_set_rhs_from_tree): Likewise.
	(gimple_assign_set_rhs_with_ops_1): Renamed from
	gimple_assign_set_rhs_with_ops.  Add new arg for a third operand.
	(get_gimple_rhs_num_ops): Support GIMPLE_TERNARY_RHS.
	(get_gimple_rhs_num_ops): Handle WIDEN_MULT_PLUS_EXPR and
	WIDEN_MULT_MINUS_EXPR.
	* gimple.h (enum gimple_rhs_class): Add GIMPLE_TERNARY_RHS.
	(extract_ops_from_tree_1): Adjust declaration.
	(gimple_assign_set_rhs_with_ops_1): Likewise.
	(gimple_build_assign_with_ops): Pass NULL for last operand.
	(gimple_build_assign_with_ops3): New macro.
	(gimple_assign_rhs3, gimple_assign_rhs3_ptr, gimple_assign_set_rhs3,
	gimple_assign_set_rhs_with_ops, extract_ops_from_tree): New inline
	functions.
	* tree-cfg.c (verify_gimple_assign_ternary): New static function.
	(verify_gimple_assign): Call it.

	* config/arm/arm.md (maddsidi4, umaddsidi4): New expanders.
	(maddhisi4): Renamed from mulhisi3addsi.  Operands renumbered.
	(maddhidi4): Likewise.
	
	* gcc.target/arm/wmul-1.c: Test for smlabb instead of smulbb.
	* gcc.target/arm/wmul-3.c: New test.
	* gcc.target/mips/madd-9.c: New test.

Comments

Joseph Myers June 22, 2010, 11:36 p.m. UTC | #1
On Wed, 23 Jun 2010, Bernd Schmidt wrote:

> Most passes probably don't need to handle ternary gimple statements
> (tree-ssa-math-opts runs quite late), so I've provided some wrappers
> around frequently used functions so that passes can for now continue to
> use the simpler interface.

I'm not clear on whether this means that earlier passes will fail to work 
if they see ternary gimple statements, or if they will work but not do any 
optimizations with them, or something else.

Such statements may not be seen in early passes right now.  But generating 
floating-point fused multiply-add from a*b+c in a way that follows C99 
requirements on contracting expressions would require the C front end to 
generate new GIMPLE codes for floating-point fused multiply-add, which I'd 
think would be a ternary statement, so it would be present through all the 
GIMPLE passes.

(C99 only permits contracting within the bounds of a source language 
expression.  So the present approach of describing fused operations with 
the RTL for separate multiplication and addition operations, and letting 
combine generate such instructions, is not correct in C99 terms, and only 
the front end has the information to know where contracting is permitted 
by C99; new GIMPLE and RTL codes would be needed.  New RTL codes would be 
useful anyway without new GIMPLE codes or proper contracting support, to 
allow __builtin_fma* to generate appropriate instructions.)
Bernd Schmidt June 22, 2010, 11:40 p.m. UTC | #2
On 06/23/2010 01:36 AM, Joseph S. Myers wrote:
> On Wed, 23 Jun 2010, Bernd Schmidt wrote:
> 
>> Most passes probably don't need to handle ternary gimple statements
>> (tree-ssa-math-opts runs quite late), so I've provided some wrappers
>> around frequently used functions so that passes can for now continue to
>> use the simpler interface.
> 
> I'm not clear on whether this means that earlier passes will fail to work 
> if they see ternary gimple statements, or if they will work but not do any 
> optimizations with them, or something else.

They will fail to work; for example see the gcc_assert added in
extract_ops_from_tree.

> Such statements may not be seen in early passes right now.  But generating 
> floating-point fused multiply-add from a*b+c in a way that follows C99 
> requirements on contracting expressions would require the C front end to 
> generate new GIMPLE codes for floating-point fused multiply-add, which I'd 
> think would be a ternary statement, so it would be present through all the 
> GIMPLE passes.

If that happens we can modify all the passes and remove the wrappers.


Bernd
Paolo Bonzini June 23, 2010, 2:37 a.m. UTC | #3
On 06/23/2010 01:36 AM, Joseph S. Myers wrote:
> (C99 only permits contracting within the bounds of a source language
> expression.

So that means that you can turn on flag_associative_math if 
!flag_trapping_math && !flag_signed_zeros (like the Fortran front-end 
does), as long as every source language expression is wrapped implicitly 
in a PAREN_EXPR?

However, I think it also needs to split flag_associative_math (e.g. into 
flag_paren_associative_math for GIMPLE and flag_associative_math for 
both GIMPLE and RTL), since RTL doesn't know PAREN_EXPRs.

Paolo
Richard Biener June 23, 2010, 9:26 a.m. UTC | #4
On Wed, Jun 23, 2010 at 1:25 AM, Bernd Schmidt <bernds@codesourcery.com> wrote:
> Here's a patch to fix most of PR43902, which is about missing support
> for multiply-accumulate instructions on MIPS.  Jim Wilson did most of
> the work on this patch, adding a new optimization in the
> optimize_widening_multiply pass; I've slightly modified it to add
> support for ternary gimple statements, as well as adding ARM bits.
> There's some history and discussion in the PR.
>
> Most passes probably don't need to handle ternary gimple statements
> (tree-ssa-math-opts runs quite late), so I've provided some wrappers
> around frequently used functions so that passes can for now continue to
> use the simpler interface.
>
> I've tried for a while to convert DOT_PROD_EXPR to use this new
> infrastructure, but it took my rather far down into the vectorizer and I
> gave up.  It's probably something the vectorizer maintainers should look
> into.
>
> Bootstrapped and regression tested on i686-linux.  Ok?

+/* Widening multiply-accumulate.
+   The first two arguments are of type t1.
+   The third argument and the result are of type t2, such as t2 is at least
+   twice the size of t1.  This is equivalent to a WIDEN_MULT_EXPR operation
+   followed by an add or subtract.  */
+DEFTREECODE (WIDEN_MULT_PLUS_EXPR, "widen_mult_plus_expr", tcc_expression, 3)
+/* This is like the above, except in the final expression the multiply result
+   is subtracted from t3.  */
+DEFTREECODE (WIDEN_MULT_MINUS_EXPR, "widen_mult_plus_expr", tcc_expression, 3)

So it computes (op0 * op1) +- op2?  Please adjust the comment
to say which operands are multiplied and which is added/subtracted.

+    case WIDEN_MULT_PLUS_EXPR:
+    case WIDEN_MULT_MINUS_EXPR:
+      if ((!INTEGRAL_TYPE_P (rhs1_type)
+	   && !FIXED_POINT_TYPE_P (rhs1_type)
+	   && !(TREE_CODE (rhs1_type) == VECTOR_TYPE
+		&& INTEGRAL_TYPE_P (TREE_TYPE (rhs1_type))))
+	  || !useless_type_conversion_p (rhs1_type, rhs2_type)
+	  || !useless_type_conversion_p (lhs_type, rhs3_type)
+	  || 2 * TYPE_PRECISION (rhs1_type) != TYPE_PRECISION (lhs_type)
+	  || TYPE_PRECISION (rhs1_type) != TYPE_PRECISION (rhs2_type))

So this restricts this to integral or fixed-point types.  Can you
document it as such in the comment in tree.def?

Your support for ternary gimple is far from complete - I'm not sure
we want to have this half-supported state (though I guess I don't
care too much and definitely like that we start on it rather than
using more single rhss).

I am going to work on FP MAC detection at some point which
will happen before the vectorizer so I guess I can fixup some more
places.

Can you adjust gimple.texi for the new RHS type?

Thanks,
Richard.
Joseph Myers June 23, 2010, 11:16 a.m. UTC | #5
On Wed, 23 Jun 2010, Paolo Bonzini wrote:

> On 06/23/2010 01:36 AM, Joseph S. Myers wrote:
> > (C99 only permits contracting within the bounds of a source language
> > expression.
> 
> So that means that you can turn on flag_associative_math if
> !flag_trapping_math && !flag_signed_zeros (like the Fortran front-end does),
> as long as every source language expression is wrapped implicitly in a
> PAREN_EXPR?

I am not familiar with the effects of flag_associative_math and 
PAREN_EXPR.

Reassociation is not permitted by C99.  What is permitted, unless the 
FP_CONTRACT pragma is used to disallow it, is contracting - evaluating a 
source language expression involving more than one operator as if it were 
an atomic operation, with a single rounding and a single setting of 
exception flags rather than rounding and setting exceptions for each 
individual source language operator.

Given:

  f = a*b + c;

it is permitted to use a fused operation.  Given:

  t = a*b;
  f = t+c;

it is not permitted to use a fused operation.  If c was a*d, in neither 
case would it be permitted to transform things to a*(b+d) if the addition 
and multiplication in a*(b+d) are carried out as separate operations.  I 
don't see how PAREN_EXPR could be used to represent that a fused 
multiply-add is OK for a*b + a*d but that using a*(b+d) isn't.  (I don't 
know if GCC carries out such distributive transformations anyway, or if 
PAREN_EXPR is meant to affect them, or what command-line options are meant 
to affect them.)
Richard Biener June 23, 2010, 11:27 a.m. UTC | #6
On Wed, Jun 23, 2010 at 1:16 PM, Joseph S. Myers
<joseph@codesourcery.com> wrote:
> On Wed, 23 Jun 2010, Paolo Bonzini wrote:
>
>> On 06/23/2010 01:36 AM, Joseph S. Myers wrote:
>> > (C99 only permits contracting within the bounds of a source language
>> > expression.
>>
>> So that means that you can turn on flag_associative_math if
>> !flag_trapping_math && !flag_signed_zeros (like the Fortran front-end does),
>> as long as every source language expression is wrapped implicitly in a
>> PAREN_EXPR?
>
> I am not familiar with the effects of flag_associative_math and
> PAREN_EXPR.
>
> Reassociation is not permitted by C99.  What is permitted, unless the
> FP_CONTRACT pragma is used to disallow it, is contracting - evaluating a
> source language expression involving more than one operator as if it were
> an atomic operation, with a single rounding and a single setting of
> exception flags rather than rounding and setting exceptions for each
> individual source language operator.
>
> Given:
>
>  f = a*b + c;
>
> it is permitted to use a fused operation.  Given:
>
>  t = a*b;
>  f = t+c;
>
> it is not permitted to use a fused operation.  If c was a*d, in neither
> case would it be permitted to transform things to a*(b+d) if the addition
> and multiplication in a*(b+d) are carried out as separate operations.  I
> don't see how PAREN_EXPR could be used to represent that a fused
> multiply-add is OK for a*b + a*d but that using a*(b+d) isn't.  (I don't
> know if GCC carries out such distributive transformations anyway, or if
> PAREN_EXPR is meant to affect them, or what command-line options are meant
> to affect them.)

PAREN_EXPR is supposed to act as re-association barrier only.
So techincally for t = a*b; f = t+c; we could emit t = (a*b); f = (t+c);
so the middle-end would see ((a*b)+c) which we could avoid to
contract based on the parens.

As for flags we'd need to split up flag_unsafe_math_optimizations
further and introduce -fcontracting-math and/or -fexpanding-math
(covering x*x*x*x -> t = x*x; t*t and pow(x,4) -> t = x*x; t*t).

Richard.

> --
> Joseph S. Myers
> joseph@codesourcery.com
>
Joseph Myers June 23, 2010, 11:38 a.m. UTC | #7
On Wed, 23 Jun 2010, Richard Guenther wrote:

> PAREN_EXPR is supposed to act as re-association barrier only.
> So techincally for t = a*b; f = t+c; we could emit t = (a*b); f = (t+c);
> so the middle-end would see ((a*b)+c) which we could avoid to
> contract based on the parens.

I'd expect that f = a*b+c; would be expanded to exactly the same t = 
(a*b); f = (t+c); GIMPLE, given that no reassociation is permitted in C99; 
certainly f=a*b*c; would be expanded in a way that reflects that this is 
(a*b)*c and no reassociation is permitted.

> As for flags we'd need to split up flag_unsafe_math_optimizations
> further and introduce -fcontracting-math and/or -fexpanding-math
> (covering x*x*x*x -> t = x*x; t*t and pow(x,4) -> t = x*x; t*t).

Since x*x*x*x is (((x*x)*x)*x) I don't think that transformation is 
actually within the C99 notion of contracting; it's reassociating the 
multiplications rather than evaluating some source expression atomically.

But, yes, some new option would be needed to set the default state of the 
FP_CONTRACT pragma.  With my notion of contracting only within the front 
end when a conforming state of contracting is set (there would also be a 
"fast" state permitting the present contracting outside of source 
expressions, I suppose), this would actually be quite an easy pragma to 
implement (modulo the changes for supporting ternary GIMPLE statements 
everywhere) since only the front end would need to know about it; there 
would be no need to attach pragma states to individual operations through 
the optimizers as there would be for the other floating-point pragmas.
Paolo Bonzini June 23, 2010, 11:57 a.m. UTC | #8
On 06/23/2010 01:38 PM, Joseph S. Myers wrote:
> On Wed, 23 Jun 2010, Richard Guenther wrote:
>
>> PAREN_EXPR is supposed to act as re-association barrier only.
>> So techincally for t = a*b; f = t+c; we could emit t = (a*b); f = (t+c);
>> so the middle-end would see ((a*b)+c) which we could avoid to
>> contract based on the parens.
>
> I'd expect that f = a*b+c; would be expanded to exactly the same t =
> (a*b); f = (t+c); GIMPLE, given that no reassociation is permitted in C99;
> certainly f=a*b*c; would be expanded in a way that reflects that this is
> (a*b)*c and no reassociation is permitted.

With FP_CONTRACT on, you would expand it as t1=a*b;t2=t1+c;f=(t2).

>> -fcontracting-math and/or -fexpanding-math
>> (covering x*x*x*x -> t = x*x; t*t and pow(x,4) -> t = x*x; t*t).
 >
> Since x*x*x*x is (((x*x)*x)*x) I don't think that transformation is
> actually within the C99 notion of contracting; it's reassociating the
> multiplications rather than evaluating some source expression atomically.

The standard says: "the intermediate operations in the contracted 
expression are evaluated as if to infinite precision and range, while 
the final operation is rounded to the format determined by the 
expression evaluation method" and this sounds full of interesting and 
perhaps unwanted cases.

So x*x*x*x -> (x**2)**2 is not a contraction.  But x*x*x*x -> 
__builtin_powi(x,4) may be a contraction depending on the implementation 
of __builtin_powi.

And contracting for example allows to evaluate (x + 1.0) - x as (x - x) 
+ 1.0, which is definitely outside.

> With my notion of contracting only within the front
> end when a conforming state of contracting is set (there would also be a
> "fast" state permitting the present contracting outside of source
> expressions, I suppose), this would actually be quite an easy pragma to
> implement (modulo the changes for supporting ternary GIMPLE statements
> everywhere) since only the front end would need to know about it; there
> would be no need to attach pragma states to individual operations through
> the optimizers as there would be for the other floating-point pragmas.

You don't need pragma states, you just need optimization barriers that 
the front-end can place easily, and a careful distinction throughout the 
folders between what is a contraction and what is not.  Of course you 
can start with the conservative definition that "nothing" is a 
contraction. :)

Paolo

Paolo
Joseph Myers June 23, 2010, 12:16 p.m. UTC | #9
On Wed, 23 Jun 2010, Paolo Bonzini wrote:

> On 06/23/2010 01:38 PM, Joseph S. Myers wrote:
> > On Wed, 23 Jun 2010, Richard Guenther wrote:
> > 
> > > PAREN_EXPR is supposed to act as re-association barrier only.
> > > So techincally for t = a*b; f = t+c; we could emit t = (a*b); f = (t+c);
> > > so the middle-end would see ((a*b)+c) which we could avoid to
> > > contract based on the parens.
> > 
> > I'd expect that f = a*b+c; would be expanded to exactly the same t =
> > (a*b); f = (t+c); GIMPLE, given that no reassociation is permitted in C99;
> > certainly f=a*b*c; would be expanded in a way that reflects that this is
> > (a*b)*c and no reassociation is permitted.
> 
> With FP_CONTRACT on, you would expand it as t1=a*b;t2=t1+c;f=(t2).

This certainly sounds much like the front end having knowledge of the 
particular contracted operations that are available on the target, so as 
not to insert PAREN_EXPR about (a*b).  Or are you saying that in the 
absence of -fassociative-math, PAREN_EXPR would operate as a barrier to 
contraction but that reassociation wouldn't be permitted at all, so that a 
single PAREN_EXPR would go around each maximal source expression within 
which contraction is permitted and that (a*b*c+d) could then be evaluated 
as a fused operation on a*b, c and d but not one on a, b*c and d?
Richard Biener June 23, 2010, 12:28 p.m. UTC | #10
On Wed, Jun 23, 2010 at 2:16 PM, Joseph S. Myers
<joseph@codesourcery.com> wrote:
> On Wed, 23 Jun 2010, Paolo Bonzini wrote:
>
>> On 06/23/2010 01:38 PM, Joseph S. Myers wrote:
>> > On Wed, 23 Jun 2010, Richard Guenther wrote:
>> >
>> > > PAREN_EXPR is supposed to act as re-association barrier only.
>> > > So techincally for t = a*b; f = t+c; we could emit t = (a*b); f = (t+c);
>> > > so the middle-end would see ((a*b)+c) which we could avoid to
>> > > contract based on the parens.
>> >
>> > I'd expect that f = a*b+c; would be expanded to exactly the same t =
>> > (a*b); f = (t+c); GIMPLE, given that no reassociation is permitted in C99;
>> > certainly f=a*b*c; would be expanded in a way that reflects that this is
>> > (a*b)*c and no reassociation is permitted.
>>
>> With FP_CONTRACT on, you would expand it as t1=a*b;t2=t1+c;f=(t2).
>
> This certainly sounds much like the front end having knowledge of the
> particular contracted operations that are available on the target, so as
> not to insert PAREN_EXPR about (a*b).  Or are you saying that in the
> absence of -fassociative-math, PAREN_EXPR would operate as a barrier to
> contraction but that reassociation wouldn't be permitted at all, so that a
> single PAREN_EXPR would go around each maximal source expression within
> which contraction is permitted and that (a*b*c+d) could then be evaluated
> as a fused operation on a*b, c and d but not one on a, b*c and d?

PAREN_EXPR is always a re-association barrier, even with
-fassociative-math.  Without -fassociative-math _all_ reassociation
is prohibited.  So yes, you'd place a PAREN_EXPR around each
maximal source expression to limit contraction.

Richard.

> --
> Joseph S. Myers
> joseph@codesourcery.com
>
Richard Biener June 23, 2010, 12:30 p.m. UTC | #11
On Wed, Jun 23, 2010 at 1:38 PM, Joseph S. Myers
<joseph@codesourcery.com> wrote:
> On Wed, 23 Jun 2010, Richard Guenther wrote:
>
>> PAREN_EXPR is supposed to act as re-association barrier only.
>> So techincally for t = a*b; f = t+c; we could emit t = (a*b); f = (t+c);
>> so the middle-end would see ((a*b)+c) which we could avoid to
>> contract based on the parens.
>
> I'd expect that f = a*b+c; would be expanded to exactly the same t =
> (a*b); f = (t+c); GIMPLE, given that no reassociation is permitted in C99;
> certainly f=a*b*c; would be expanded in a way that reflects that this is
> (a*b)*c and no reassociation is permitted.

We don't generate PAREN_EXPRs from the middle-end.  The
middle-end relies on flag_associative_math to disable reassociation.
PAREN_EXPRs disable reassociation at specific points if
flag_associative_math is on (like for Fortran, where re-association
is generally permitted if the programmer didn't put in explicit
parentheses).

Richard.
Paolo Bonzini June 23, 2010, 12:33 p.m. UTC | #12
On 06/23/2010 02:28 PM, Richard Guenther wrote:
> On Wed, Jun 23, 2010 at 2:16 PM, Joseph S. Myers
>>>> I'd expect that f = a*b+c; would be expanded to exactly the same t =
>>>> (a*b); f = (t+c);
>>>
>>> With FP_CONTRACT on, you would expand it as t1=a*b;t2=t1+c;f=(t2).
>>
>> This certainly sounds much like the front end having knowledge of the
>> particular contracted operations that are available on the target, so as
>> not to insert PAREN_EXPR about (a*b).  Or are you saying that in the
>> absence of -fassociative-math, PAREN_EXPR would operate as a barrier to
>> contraction but that reassociation wouldn't be permitted at all, so that a
>> single PAREN_EXPR would go around each maximal source expression within
>> which contraction is permitted and that (a*b*c+d) could then be evaluated
>> as a fused operation on a*b, c and d but not one on a, b*c and d?
>
> PAREN_EXPR is always a re-association barrier, even with
> -fassociative-math.  Without -fassociative-math _all_ reassociation
> is prohibited.

... and -fcontracting-math would still use PAREN_EXPR as a contraction 
barrier.

> So yes, you'd place a PAREN_EXPR around each
> maximal source expression to limit contraction.

Agreed.

Paolo
diff mbox

Patch

Index: tree-pretty-print.c
===================================================================
--- tree-pretty-print.c	(revision 160997)
+++ tree-pretty-print.c	(working copy)
@@ -1947,6 +1947,26 @@  dump_generic_node (pretty_printer *buffe
       pp_string (buffer, " > ");
       break;
 
+    case WIDEN_MULT_PLUS_EXPR:
+      pp_string (buffer, " WIDEN_MULT_PLUS_EXPR < ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 0), spc, flags, false);
+      pp_string (buffer, ", ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 1), spc, flags, false);
+      pp_string (buffer, ", ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 2), spc, flags, false);
+      pp_string (buffer, " > ");
+      break;
+
+    case WIDEN_MULT_MINUS_EXPR:
+      pp_string (buffer, " WIDEN_MULT_MINUS_EXPR < ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 0), spc, flags, false);
+      pp_string (buffer, ", ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 1), spc, flags, false);
+      pp_string (buffer, ", ");
+      dump_generic_node (buffer, TREE_OPERAND (node, 2), spc, flags, false);
+      pp_string (buffer, " > ");
+      break;
+
     case OMP_PARALLEL:
       pp_string (buffer, "#pragma omp parallel");
       dump_omp_clauses (buffer, OMP_PARALLEL_CLAUSES (node), spc, flags);
@@ -2440,6 +2460,8 @@  op_code_prio (enum tree_code code)
     case VEC_WIDEN_MULT_LO_EXPR:
     case WIDEN_MULT_EXPR:
     case DOT_PROD_EXPR:
+    case WIDEN_MULT_PLUS_EXPR:
+    case WIDEN_MULT_MINUS_EXPR:
     case MULT_EXPR:
     case TRUNC_DIV_EXPR:
     case CEIL_DIV_EXPR:
Index: optabs.c
===================================================================
--- optabs.c	(revision 160997)
+++ optabs.c	(working copy)
@@ -407,6 +407,20 @@  optab_for_tree_code (enum tree_code code
     case DOT_PROD_EXPR:
       return TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab;
 
+    case WIDEN_MULT_PLUS_EXPR:
+      return (TYPE_UNSIGNED (type)
+	      ? (TYPE_SATURATING (type)
+		 ? usmadd_widen_optab : umadd_widen_optab)
+	      : (TYPE_SATURATING (type)
+		 ? ssmadd_widen_optab : smadd_widen_optab));
+
+    case WIDEN_MULT_MINUS_EXPR:
+      return (TYPE_UNSIGNED (type)
+	      ? (TYPE_SATURATING (type)
+		 ? usmsub_widen_optab : umsub_widen_optab)
+	      : (TYPE_SATURATING (type)
+		 ? ssmsub_widen_optab : smsub_widen_optab));
+
     case REDUC_MAX_EXPR:
       return TYPE_UNSIGNED (type) ? reduc_umax_optab : reduc_smax_optab;
 
@@ -546,7 +560,12 @@  expand_widen_pattern_expr (sepops ops, r
   tmode0 = TYPE_MODE (TREE_TYPE (oprnd0));
   widen_pattern_optab =
     optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), optab_default);
-  icode = (int) optab_handler (widen_pattern_optab, tmode0)->insn_code;
+  if (ops->code == WIDEN_MULT_PLUS_EXPR
+      || ops->code == WIDEN_MULT_MINUS_EXPR)
+    icode = (int) optab_handler (widen_pattern_optab,
+				 TYPE_MODE (TREE_TYPE (ops->op2)))->insn_code;
+  else
+    icode = (int) optab_handler (widen_pattern_optab, tmode0)->insn_code;
   gcc_assert (icode != CODE_FOR_nothing);
   xmode0 = insn_data[icode].operand[1].mode;
 
Index: testsuite/gcc.target/arm/wmul-1.c
===================================================================
--- testsuite/gcc.target/arm/wmul-1.c	(revision 160997)
+++ testsuite/gcc.target/arm/wmul-1.c	(working copy)
@@ -15,4 +15,4 @@  int mac(const short *a, const short *b, 
   return sqr;
 }
 
-/* { dg-final { scan-assembler-times "smulbb" 2 } } */
+/* { dg-final { scan-assembler-times "smlabb" 2 } } */
Index: testsuite/gcc.target/arm/wmul-3.c
===================================================================
--- testsuite/gcc.target/arm/wmul-3.c	(revision 0)
+++ testsuite/gcc.target/arm/wmul-3.c	(revision 0)
@@ -0,0 +1,18 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv6t2" } */
+
+int mac(const short *a, const short *b, int sqr, int *sum)
+{
+  int i;
+  int dotp = *sum;
+
+  for (i = 0; i < 150; i++) {
+    dotp -= b[i] * a[i];
+    sqr -= b[i] * b[i];
+  }
+
+  *sum = dotp;
+  return sqr;
+}
+
+/* { dg-final { scan-assembler-times "smulbb" 2 } } */
Index: testsuite/gcc.target/mips/madd-9.c
===================================================================
--- testsuite/gcc.target/mips/madd-9.c	(revision 0)
+++ testsuite/gcc.target/mips/madd-9.c	(revision 0)
@@ -0,0 +1,16 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 isa_rev>=1 -mgp32" } */
+/* { dg-final { scan-assembler-not "\tmul\t" } } */
+/* { dg-final { scan-assembler "\tmadd\t" } } */
+
+NOMIPS16 long long
+f1 (int *a, int *b, int n)
+{
+  long long int x;
+  int i;
+
+  x = 0;
+  for (i = 0; i < n; i++)
+    x += (long long) a[i] * b[i];
+  return x;
+}
Index: tree-ssa-math-opts.c
===================================================================
--- tree-ssa-math-opts.c	(revision 160997)
+++ tree-ssa-math-opts.c	(working copy)
@@ -1262,6 +1262,190 @@  struct gimple_opt_pass pass_optimize_bsw
  }
 };
 
+/* Process a single gimple statement STMT, which has a MULT_EXPR as
+   its rhs, and try to convert it into a WIDEN_MULT_EXPR.  The return
+   value is true iff we converted the statement.  */
+
+static bool
+convert_mult_to_widen (gimple stmt)
+{
+  gimple rhs1_stmt = NULL, rhs2_stmt = NULL;
+  tree type1 = NULL, type2 = NULL;
+  tree rhs1, rhs2, rhs1_convop = NULL, rhs2_convop = NULL;
+  enum tree_code rhs1_code, rhs2_code;
+  tree type;
+
+  type = TREE_TYPE (gimple_assign_lhs (stmt));
+
+  if (TREE_CODE (type) != INTEGER_TYPE)
+    return false;
+
+  rhs1 = gimple_assign_rhs1 (stmt);
+  rhs2 = gimple_assign_rhs2 (stmt);
+
+  if (TREE_CODE (rhs1) == SSA_NAME)
+    {
+      rhs1_stmt = SSA_NAME_DEF_STMT (rhs1);
+      if (!is_gimple_assign (rhs1_stmt))
+	return false;
+      rhs1_code = gimple_assign_rhs_code (rhs1_stmt);
+      if (!CONVERT_EXPR_CODE_P (rhs1_code))
+	return false;
+      rhs1_convop = gimple_assign_rhs1 (rhs1_stmt);
+      type1 = TREE_TYPE (rhs1_convop);
+      if (TYPE_PRECISION (type1) * 2 != TYPE_PRECISION (type))
+	return false;
+    }
+  else if (TREE_CODE (rhs1) != INTEGER_CST)
+    return false;
+
+  if (TREE_CODE (rhs2) == SSA_NAME)
+    {
+      rhs2_stmt = SSA_NAME_DEF_STMT (rhs2);
+      if (!is_gimple_assign (rhs2_stmt))
+	return false;
+      rhs2_code = gimple_assign_rhs_code (rhs2_stmt);
+      if (!CONVERT_EXPR_CODE_P (rhs2_code))
+	return false;
+      rhs2_convop = gimple_assign_rhs1 (rhs2_stmt);
+      type2 = TREE_TYPE (rhs2_convop);
+      if (TYPE_PRECISION (type2) * 2 != TYPE_PRECISION (type))
+	return false;
+    }
+  else if (TREE_CODE (rhs2) != INTEGER_CST)
+    return false;
+
+  if (rhs1_stmt == NULL && rhs2_stmt == NULL)
+    return false;
+
+  /* Verify that the machine can perform a widening multiply in this
+     mode/signedness combination, otherwise this transformation is
+     likely to pessimize code.  */
+  if ((rhs1_stmt == NULL || TYPE_UNSIGNED (type1))
+      && (rhs2_stmt == NULL || TYPE_UNSIGNED (type2))
+      && (optab_handler (umul_widen_optab, TYPE_MODE (type))
+	  ->insn_code == CODE_FOR_nothing))
+    return false;
+  else if ((rhs1_stmt == NULL || !TYPE_UNSIGNED (type1))
+	   && (rhs2_stmt == NULL || !TYPE_UNSIGNED (type2))
+	   && (optab_handler (smul_widen_optab, TYPE_MODE (type))
+	       ->insn_code == CODE_FOR_nothing))
+    return false;
+  else if (rhs1_stmt != NULL && rhs2_stmt != NULL
+	   && (TYPE_UNSIGNED (type1) != TYPE_UNSIGNED (type2))
+	   && (optab_handler (usmul_widen_optab, TYPE_MODE (type))
+	       ->insn_code == CODE_FOR_nothing))
+    return false;
+
+  if ((rhs1_stmt == NULL && !int_fits_type_p (rhs1, type2))
+      || (rhs2_stmt == NULL && !int_fits_type_p (rhs2, type1)))
+    return false;
+
+  if (rhs1_stmt == NULL)
+    gimple_assign_set_rhs1 (stmt, fold_convert (type2, rhs1));
+  else
+    gimple_assign_set_rhs1 (stmt, rhs1_convop);
+  if (rhs2_stmt == NULL)
+    gimple_assign_set_rhs2 (stmt, fold_convert (type1, rhs2));
+  else
+    gimple_assign_set_rhs2 (stmt, rhs2_convop);
+  gimple_assign_set_rhs_code (stmt, WIDEN_MULT_EXPR);
+  update_stmt (stmt);
+  return true;
+}
+
+/* Process a single gimple statement STMT, which is found at the
+   iterator GSI and has a either a PLUS_EXPR or a MINUS_EXPR as its
+   rhs (given by CODE), and try to convert it into a
+   WIDEN_MULT_PLUS_EXPR or a WIDEN_MULT_MINUS_EXPR.  The return value
+   is true iff we converted the statement.  */
+
+static bool
+convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
+			    enum tree_code code)
+{
+  gimple rhs1_stmt = NULL, rhs2_stmt = NULL;
+  tree type;
+  tree lhs, rhs1, rhs2, mult_rhs1, mult_rhs2, add_rhs;
+  enum tree_code rhs1_code = ERROR_MARK, rhs2_code = ERROR_MARK;
+  optab this_optab;
+  enum tree_code wmult_code;
+
+  lhs = gimple_assign_lhs (stmt);
+  type = TREE_TYPE (lhs);
+  if (TREE_CODE (type) != INTEGER_TYPE)
+    return false;
+
+  if (code == MINUS_EXPR)
+    wmult_code = WIDEN_MULT_MINUS_EXPR;
+  else
+    wmult_code = WIDEN_MULT_PLUS_EXPR;
+
+  /* Verify that the machine can perform a widening multiply
+     accumulate in this mode/signedness combination, otherwise
+     this transformation is likely to pessimize code.  */
+  this_optab = optab_for_tree_code (wmult_code, type, optab_default);
+  if (optab_handler (this_optab, TYPE_MODE (type))->insn_code
+      == CODE_FOR_nothing)
+    return false;
+
+  rhs1 = gimple_assign_rhs1 (stmt);
+  rhs2 = gimple_assign_rhs2 (stmt);
+
+  if (TREE_CODE (rhs1) == SSA_NAME)
+    {
+      rhs1_stmt = SSA_NAME_DEF_STMT (rhs1);
+      if (is_gimple_assign (rhs1_stmt))
+	rhs1_code = gimple_assign_rhs_code (rhs1_stmt);
+    }
+  else
+    return false;
+
+  if (TREE_CODE (rhs2) == SSA_NAME)
+    {
+      rhs2_stmt = SSA_NAME_DEF_STMT (rhs2);
+      if (is_gimple_assign (rhs2_stmt))
+	rhs2_code = gimple_assign_rhs_code (rhs2_stmt);
+    }
+  else
+    return false;
+
+  if (rhs1_code == MULT_EXPR)
+    {
+      if (!convert_mult_to_widen (rhs1_stmt))
+	return false;
+      rhs1_code = gimple_assign_rhs_code (rhs1_stmt);
+    }
+  if (rhs2_code == MULT_EXPR)
+    {
+      if (!convert_mult_to_widen (rhs2_stmt))
+	return false;
+      rhs2_code = gimple_assign_rhs_code (rhs2_stmt);
+    }
+  
+  if (code == PLUS_EXPR && rhs1_code == WIDEN_MULT_EXPR)
+    {
+      mult_rhs1 = gimple_assign_rhs1 (rhs1_stmt);
+      mult_rhs2 = gimple_assign_rhs2 (rhs1_stmt);
+      add_rhs = rhs2;
+    }
+  else if (rhs2_code == WIDEN_MULT_EXPR)
+    {
+      mult_rhs1 = gimple_assign_rhs1 (rhs2_stmt);
+      mult_rhs2 = gimple_assign_rhs2 (rhs2_stmt);
+      add_rhs = rhs1;
+    }
+  else
+    return false;
+
+  /* ??? May need some type verification here?  */
+
+  gimple_assign_set_rhs_with_ops_1 (gsi, wmult_code, mult_rhs1, mult_rhs2,
+				    add_rhs);
+  update_stmt (gsi_stmt (*gsi));
+  return true;
+}
+
 /* Find integer multiplications where the operands are extended from
    smaller types, and replace the MULT_EXPR with a WIDEN_MULT_EXPR
    where appropriate.  */
@@ -1279,94 +1463,19 @@  execute_optimize_widening_mul (void)
       for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi); gsi_next (&gsi))
         {
 	  gimple stmt = gsi_stmt (gsi);
-	  gimple rhs1_stmt = NULL, rhs2_stmt = NULL;
-	  tree type, type1 = NULL, type2 = NULL;
-	  tree rhs1, rhs2, rhs1_convop = NULL, rhs2_convop = NULL;
-	  enum tree_code rhs1_code, rhs2_code;
-
-	  if (!is_gimple_assign (stmt)
-	      || gimple_assign_rhs_code (stmt) != MULT_EXPR)
-	    continue;
-
-	  type = TREE_TYPE (gimple_assign_lhs (stmt));
-
-	  if (TREE_CODE (type) != INTEGER_TYPE)
-	    continue;
-
-	  rhs1 = gimple_assign_rhs1 (stmt);
-	  rhs2 = gimple_assign_rhs2 (stmt);
-
-	  if (TREE_CODE (rhs1) == SSA_NAME)
-	    {
-	      rhs1_stmt = SSA_NAME_DEF_STMT (rhs1);
-	      if (!is_gimple_assign (rhs1_stmt))
-		continue;
-	      rhs1_code = gimple_assign_rhs_code (rhs1_stmt);
-	      if (!CONVERT_EXPR_CODE_P (rhs1_code))
-		continue;
-	      rhs1_convop = gimple_assign_rhs1 (rhs1_stmt);
-	      type1 = TREE_TYPE (rhs1_convop);
-	      if (TYPE_PRECISION (type1) * 2 != TYPE_PRECISION (type))
-		continue;
-	    }
-	  else if (TREE_CODE (rhs1) != INTEGER_CST)
-	    continue;
-
-	  if (TREE_CODE (rhs2) == SSA_NAME)
-	    {
-	      rhs2_stmt = SSA_NAME_DEF_STMT (rhs2);
-	      if (!is_gimple_assign (rhs2_stmt))
-		continue;
-	      rhs2_code = gimple_assign_rhs_code (rhs2_stmt);
-	      if (!CONVERT_EXPR_CODE_P (rhs2_code))
-		continue;
-	      rhs2_convop = gimple_assign_rhs1 (rhs2_stmt);
-	      type2 = TREE_TYPE (rhs2_convop);
-	      if (TYPE_PRECISION (type2) * 2 != TYPE_PRECISION (type))
-		continue;
-	    }
-	  else if (TREE_CODE (rhs2) != INTEGER_CST)
-	    continue;
-
-	  if (rhs1_stmt == NULL && rhs2_stmt == NULL)
-	    continue;
-
-	  /* Verify that the machine can perform a widening multiply in this
-	     mode/signedness combination, otherwise this transformation is
-	     likely to pessimize code.  */
-	  if ((rhs1_stmt == NULL || TYPE_UNSIGNED (type1))
-	      && (rhs2_stmt == NULL || TYPE_UNSIGNED (type2))
-	      && (optab_handler (umul_widen_optab, TYPE_MODE (type))
-		  ->insn_code == CODE_FOR_nothing))
-	    continue;
-	  else if ((rhs1_stmt == NULL || !TYPE_UNSIGNED (type1))
-		   && (rhs2_stmt == NULL || !TYPE_UNSIGNED (type2))
-		   && (optab_handler (smul_widen_optab, TYPE_MODE (type))
-		       ->insn_code == CODE_FOR_nothing))
-	    continue;
-	  else if (rhs1_stmt != NULL && rhs2_stmt != 0
-		   && (TYPE_UNSIGNED (type1) != TYPE_UNSIGNED (type2))
-		   && (optab_handler (usmul_widen_optab, TYPE_MODE (type))
-		       ->insn_code == CODE_FOR_nothing))
-	    continue;
+	  enum tree_code code;
 
-	  if ((rhs1_stmt == NULL && !int_fits_type_p (rhs1, type2))
-	      || (rhs2_stmt == NULL && !int_fits_type_p (rhs2, type1)))
+	  if (!is_gimple_assign (stmt))
 	    continue;
 
-	  if (rhs1_stmt == NULL)
-	    gimple_assign_set_rhs1 (stmt, fold_convert (type2, rhs1));
-	  else
-	    gimple_assign_set_rhs1 (stmt, rhs1_convop);
-	  if (rhs2_stmt == NULL)
-	    gimple_assign_set_rhs2 (stmt, fold_convert (type1, rhs2));
-	  else
-	    gimple_assign_set_rhs2 (stmt, rhs2_convop);
-	  gimple_assign_set_rhs_code (stmt, WIDEN_MULT_EXPR);
-	  update_stmt (stmt);
-	  changed = true;
+	  code = gimple_assign_rhs_code (stmt);
+	  if (code == MULT_EXPR)
+	    changed |= convert_mult_to_widen (stmt);
+	  else if (code == PLUS_EXPR || code == MINUS_EXPR)
+	    changed |= convert_plusminus_to_widen (&gsi, stmt, code);
 	}
     }
+
   return (changed ? TODO_dump_func | TODO_update_ssa | TODO_verify_ssa
 	  | TODO_verify_stmts : 0);
 }
Index: expr.c
===================================================================
--- expr.c	(revision 160997)
+++ expr.c	(working copy)
@@ -7239,8 +7239,6 @@  expand_expr_real_2 (sepops ops, rtx targ
   rtx subtarget, original_target;
   int ignore;
   bool reduce_bit_field;
-  gimple subexp0_def, subexp1_def;
-  tree top0, top1;
   location_t loc = ops->location;
   tree treeop0, treeop1;
 #define REDUCE_BIT_FIELD(expr)	(reduce_bit_field			  \
@@ -7260,7 +7258,8 @@  expand_expr_real_2 (sepops ops, rtx targ
      exactly those that are valid in gimple expressions that aren't
      GIMPLE_SINGLE_RHS (or invalid).  */
   gcc_assert (get_gimple_rhs_class (code) == GIMPLE_UNARY_RHS
-	      || get_gimple_rhs_class (code) == GIMPLE_BINARY_RHS);
+	      || get_gimple_rhs_class (code) == GIMPLE_BINARY_RHS
+	      || get_gimple_rhs_class (code) == GIMPLE_TERNARY_RHS);
 
   ignore = (target == const0_rtx
 	    || ((CONVERT_EXPR_CODE_P (code)
@@ -7435,58 +7434,6 @@  expand_expr_real_2 (sepops ops, rtx targ
 				    fold_convert_loc (loc, ssizetype,
 						      treeop1));
     case PLUS_EXPR:
-
-      /* Check if this is a case for multiplication and addition.  */
-      if ((TREE_CODE (type) == INTEGER_TYPE
-	   || TREE_CODE (type) == FIXED_POINT_TYPE)
-	  && (subexp0_def = get_def_for_expr (treeop0,
-					      MULT_EXPR)))
-	{
-	  tree subsubexp0, subsubexp1;
-	  gimple subsubexp0_def, subsubexp1_def;
-	  enum tree_code this_code;
-
-	  this_code = TREE_CODE (type) == INTEGER_TYPE ? NOP_EXPR
-						       : FIXED_CONVERT_EXPR;
-	  subsubexp0 = gimple_assign_rhs1 (subexp0_def);
-	  subsubexp0_def = get_def_for_expr (subsubexp0, this_code);
-	  subsubexp1 = gimple_assign_rhs2 (subexp0_def);
-	  subsubexp1_def = get_def_for_expr (subsubexp1, this_code);
-	  if (subsubexp0_def && subsubexp1_def
-	      && (top0 = gimple_assign_rhs1 (subsubexp0_def))
-	      && (top1 = gimple_assign_rhs1 (subsubexp1_def))
-	      && (TYPE_PRECISION (TREE_TYPE (top0))
-		  < TYPE_PRECISION (TREE_TYPE (subsubexp0)))
-	      && (TYPE_PRECISION (TREE_TYPE (top0))
-		  == TYPE_PRECISION (TREE_TYPE (top1)))
-	      && (TYPE_UNSIGNED (TREE_TYPE (top0))
-		  == TYPE_UNSIGNED (TREE_TYPE (top1))))
-	    {
-	      tree op0type = TREE_TYPE (top0);
-	      enum machine_mode innermode = TYPE_MODE (op0type);
-	      bool zextend_p = TYPE_UNSIGNED (op0type);
-	      bool sat_p = TYPE_SATURATING (TREE_TYPE (subsubexp0));
-	      if (sat_p == 0)
-		this_optab = zextend_p ? umadd_widen_optab : smadd_widen_optab;
-	      else
-		this_optab = zextend_p ? usmadd_widen_optab
-				       : ssmadd_widen_optab;
-	      if (mode == GET_MODE_2XWIDER_MODE (innermode)
-		  && (optab_handler (this_optab, mode)->insn_code
-		      != CODE_FOR_nothing))
-		{
-		  expand_operands (top0, top1, NULL_RTX, &op0, &op1,
-				   EXPAND_NORMAL);
-		  op2 = expand_expr (treeop1, subtarget,
-				     VOIDmode, EXPAND_NORMAL);
-		  temp = expand_ternary_op (mode, this_optab, op0, op1, op2,
-					    target, unsignedp);
-		  gcc_assert (temp);
-		  return REDUCE_BIT_FIELD (temp);
-		}
-	    }
-	}
-
       /* If we are adding a constant, a VAR_DECL that is sp, fp, or ap, and
 	 something else, make sure we add the register to the constant and
 	 then to the other thing.  This case can occur during strength
@@ -7601,57 +7548,6 @@  expand_expr_real_2 (sepops ops, rtx targ
       return REDUCE_BIT_FIELD (simplify_gen_binary (PLUS, mode, op0, op1));
 
     case MINUS_EXPR:
-      /* Check if this is a case for multiplication and subtraction.  */
-      if ((TREE_CODE (type) == INTEGER_TYPE
-	   || TREE_CODE (type) == FIXED_POINT_TYPE)
-	  && (subexp1_def = get_def_for_expr (treeop1,
-					      MULT_EXPR)))
-	{
-	  tree subsubexp0, subsubexp1;
-	  gimple subsubexp0_def, subsubexp1_def;
-	  enum tree_code this_code;
-
-	  this_code = TREE_CODE (type) == INTEGER_TYPE ? NOP_EXPR
-						       : FIXED_CONVERT_EXPR;
-	  subsubexp0 = gimple_assign_rhs1 (subexp1_def);
-	  subsubexp0_def = get_def_for_expr (subsubexp0, this_code);
-	  subsubexp1 = gimple_assign_rhs2 (subexp1_def);
-	  subsubexp1_def = get_def_for_expr (subsubexp1, this_code);
-	  if (subsubexp0_def && subsubexp1_def
-	      && (top0 = gimple_assign_rhs1 (subsubexp0_def))
-	      && (top1 = gimple_assign_rhs1 (subsubexp1_def))
-	      && (TYPE_PRECISION (TREE_TYPE (top0))
-		  < TYPE_PRECISION (TREE_TYPE (subsubexp0)))
-	      && (TYPE_PRECISION (TREE_TYPE (top0))
-		  == TYPE_PRECISION (TREE_TYPE (top1)))
-	      && (TYPE_UNSIGNED (TREE_TYPE (top0))
-		  == TYPE_UNSIGNED (TREE_TYPE (top1))))
-	    {
-	      tree op0type = TREE_TYPE (top0);
-	      enum machine_mode innermode = TYPE_MODE (op0type);
-	      bool zextend_p = TYPE_UNSIGNED (op0type);
-	      bool sat_p = TYPE_SATURATING (TREE_TYPE (subsubexp0));
-	      if (sat_p == 0)
-		this_optab = zextend_p ? umsub_widen_optab : smsub_widen_optab;
-	      else
-		this_optab = zextend_p ? usmsub_widen_optab
-				       : ssmsub_widen_optab;
-	      if (mode == GET_MODE_2XWIDER_MODE (innermode)
-		  && (optab_handler (this_optab, mode)->insn_code
-		      != CODE_FOR_nothing))
-		{
-		  expand_operands (top0, top1, NULL_RTX, &op0, &op1,
-				   EXPAND_NORMAL);
-		  op2 = expand_expr (treeop0, subtarget,
-				     VOIDmode, EXPAND_NORMAL);
-		  temp = expand_ternary_op (mode, this_optab, op0, op1, op2,
-					    target, unsignedp);
-		  gcc_assert (temp);
-		  return REDUCE_BIT_FIELD (temp);
-		}
-	    }
-	}
-
       /* For initializers, we are allowed to return a MINUS of two
 	 symbolic constants.  Here we handle all cases when both operands
 	 are constant.  */
@@ -7692,6 +7588,14 @@  expand_expr_real_2 (sepops ops, rtx targ
 
       goto binop2;
 
+    case WIDEN_MULT_PLUS_EXPR:
+    case WIDEN_MULT_MINUS_EXPR:
+      expand_operands (treeop0, treeop1, NULL_RTX, &op0, &op1, EXPAND_NORMAL);
+      op2 = expand_normal (ops->op2);
+      target = expand_widen_pattern_expr (ops, op0, op1, op2,
+					  target, unsignedp);
+      return target;
+
     case WIDEN_MULT_EXPR:
       /* If first operand is constant, swap them.
 	 Thus the following special case checks need only
Index: gimple-pretty-print.c
===================================================================
--- gimple-pretty-print.c	(revision 160997)
+++ gimple-pretty-print.c	(working copy)
@@ -377,6 +377,34 @@  dump_binary_rhs (pretty_printer *buffer,
     }
 }
 
+/* Helper for dump_gimple_assign.  Print the ternary RHS of the
+   assignment GS.  BUFFER, SPC and FLAGS are as in dump_gimple_stmt.  */
+
+static void
+dump_ternary_rhs (pretty_printer *buffer, gimple gs, int spc, int flags)
+{
+  const char *p;
+  enum tree_code code = gimple_assign_rhs_code (gs);
+  switch (code)
+    {
+    case WIDEN_MULT_PLUS_EXPR:
+    case WIDEN_MULT_MINUS_EXPR:
+      for (p = tree_code_name [(int) code]; *p; p++)
+	pp_character (buffer, TOUPPER (*p));
+      pp_string (buffer, " <");
+      dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags, false);
+      pp_string (buffer, ", ");
+      dump_generic_node (buffer, gimple_assign_rhs2 (gs), spc, flags, false);
+      pp_string (buffer, ", ");
+      dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags, false);
+      pp_character (buffer, '>');
+      break;
+
+    default:
+      gcc_unreachable ();
+    }
+}
+
 
 /* Dump the gimple assignment GS.  BUFFER, SPC and FLAGS are as in
    dump_gimple_stmt.  */
@@ -419,6 +447,8 @@  dump_gimple_assign (pretty_printer *buff
         dump_unary_rhs (buffer, gs, spc, flags);
       else if (gimple_num_ops (gs) == 3)
         dump_binary_rhs (buffer, gs, spc, flags);
+      else if (gimple_num_ops (gs) == 4)
+        dump_ternary_rhs (buffer, gs, spc, flags);
       else
         gcc_unreachable ();
       if (!(flags & TDF_RHS_ONLY))
Index: tree.def
===================================================================
--- tree.def	(revision 160997)
+++ tree.def	(working copy)
@@ -1080,6 +1080,16 @@  DEFTREECODE (WIDEN_SUM_EXPR, "widen_sum_
    the arguments from type t1 to type t2, and then multiplying them.  */
 DEFTREECODE (WIDEN_MULT_EXPR, "widen_mult_expr", tcc_binary, 2)
 
+/* Widening multiply-accumulate.
+   The first two arguments are of type t1.
+   The third argument and the result are of type t2, such as t2 is at least
+   twice the size of t1.  This is equivalent to a WIDEN_MULT_EXPR operation
+   followed by an add or subtract.  */
+DEFTREECODE (WIDEN_MULT_PLUS_EXPR, "widen_mult_plus_expr", tcc_expression, 3)
+/* This is like the above, except in the final expression the multiply result
+   is subtracted from t3.  */
+DEFTREECODE (WIDEN_MULT_MINUS_EXPR, "widen_mult_plus_expr", tcc_expression, 3)
+
 /* Whole vector left/right shift in bits.
    Operand 0 is a vector to be shifted.
    Operand 1 is an integer shift amount in bits.  */
Index: gimple-fold.c
===================================================================
--- gimple-fold.c	(revision 160997)
+++ gimple-fold.c	(working copy)
@@ -986,6 +986,9 @@  fold_gimple_assign (gimple_stmt_iterator
         }
       break;
 
+    case GIMPLE_TERNARY_RHS:
+      break;
+
     case GIMPLE_INVALID_RHS:
       gcc_unreachable ();
     }
Index: cfgexpand.c
===================================================================
--- cfgexpand.c	(revision 160997)
+++ cfgexpand.c	(working copy)
@@ -67,7 +67,13 @@  gimple_assign_rhs_to_tree (gimple stmt)
 
   grhs_class = get_gimple_rhs_class (gimple_expr_code (stmt));
 
-  if (grhs_class == GIMPLE_BINARY_RHS)
+  if (grhs_class == GIMPLE_TERNARY_RHS)
+    t = build3 (gimple_assign_rhs_code (stmt),
+		TREE_TYPE (gimple_assign_lhs (stmt)),
+		gimple_assign_rhs1 (stmt),
+		gimple_assign_rhs2 (stmt),
+		gimple_assign_rhs3 (stmt));
+  else if (grhs_class == GIMPLE_BINARY_RHS)
     t = build2 (gimple_assign_rhs_code (stmt),
 		TREE_TYPE (gimple_assign_lhs (stmt)),
 		gimple_assign_rhs1 (stmt),
@@ -1888,6 +1894,9 @@  expand_gimple_stmt_1 (gimple stmt)
 	    ops.type = TREE_TYPE (lhs);
 	    switch (get_gimple_rhs_class (gimple_expr_code (stmt)))
 	      {
+		case GIMPLE_TERNARY_RHS:
+		  ops.op2 = gimple_assign_rhs3 (stmt);
+		  /* Fallthru */
 		case GIMPLE_BINARY_RHS:
 		  ops.op1 = gimple_assign_rhs2 (stmt);
 		  /* Fallthru */
@@ -2238,6 +2247,8 @@  expand_debug_expr (tree exp)
 	{
 	case COND_EXPR:
 	case DOT_PROD_EXPR:
+	case WIDEN_MULT_PLUS_EXPR:
+	case WIDEN_MULT_MINUS_EXPR:
 	  goto ternary;
 
 	case TRUTH_ANDIF_EXPR:
@@ -3024,6 +3035,8 @@  expand_debug_expr (tree exp)
       return NULL;
 
     case WIDEN_MULT_EXPR:
+    case WIDEN_MULT_PLUS_EXPR:
+    case WIDEN_MULT_MINUS_EXPR:
       if (SCALAR_INT_MODE_P (GET_MODE (op0))
 	  && SCALAR_INT_MODE_P (mode))
 	{
@@ -3036,7 +3049,13 @@  expand_debug_expr (tree exp)
 	    op1 = simplify_gen_unary (ZERO_EXTEND, mode, op1, inner_mode);
 	  else
 	    op1 = simplify_gen_unary (SIGN_EXTEND, mode, op1, inner_mode);
-	  return gen_rtx_MULT (mode, op0, op1);
+	  op0 = gen_rtx_MULT (mode, op0, op1);
+	  if (TREE_CODE (exp) == WIDEN_MULT_EXPR)
+	    return op0;
+	  else if (TREE_CODE (exp) == WIDEN_MULT_PLUS_EXPR)
+	    return gen_rtx_PLUS (mode, op0, op2);
+	  else
+	    return gen_rtx_MINUS (mode, op2, op0);
 	}
       return NULL;
 
Index: tree-inline.c
===================================================================
--- tree-inline.c	(revision 160997)
+++ tree-inline.c	(working copy)
@@ -3239,6 +3239,8 @@  estimate_operator_cost (enum tree_code c
     case WIDEN_SUM_EXPR:
     case WIDEN_MULT_EXPR:
     case DOT_PROD_EXPR:
+    case WIDEN_MULT_PLUS_EXPR:
+    case WIDEN_MULT_MINUS_EXPR:
 
     case VEC_WIDEN_MULT_HI_EXPR:
     case VEC_WIDEN_MULT_LO_EXPR:
Index: gimple.c
===================================================================
--- gimple.c	(revision 160997)
+++ gimple.c	(working copy)
@@ -305,31 +305,40 @@  gimple_build_call_from_tree (tree t)
 
 
 /* Extract the operands and code for expression EXPR into *SUBCODE_P,
-   *OP1_P and *OP2_P respectively.  */
+   *OP1_P, *OP2_P and *OP3_P respectively.  */
 
 void
-extract_ops_from_tree (tree expr, enum tree_code *subcode_p, tree *op1_p,
-		       tree *op2_p)
+extract_ops_from_tree_1 (tree expr, enum tree_code *subcode_p, tree *op1_p,
+			 tree *op2_p, tree *op3_p)
 {
   enum gimple_rhs_class grhs_class;
 
   *subcode_p = TREE_CODE (expr);
   grhs_class = get_gimple_rhs_class (*subcode_p);
 
-  if (grhs_class == GIMPLE_BINARY_RHS)
+  if (grhs_class == GIMPLE_TERNARY_RHS)
     {
       *op1_p = TREE_OPERAND (expr, 0);
       *op2_p = TREE_OPERAND (expr, 1);
+      *op3_p = TREE_OPERAND (expr, 2);
+    }
+  else if (grhs_class == GIMPLE_BINARY_RHS)
+    {
+      *op1_p = TREE_OPERAND (expr, 0);
+      *op2_p = TREE_OPERAND (expr, 1);
+      *op3_p = NULL_TREE;
     }
   else if (grhs_class == GIMPLE_UNARY_RHS)
     {
       *op1_p = TREE_OPERAND (expr, 0);
       *op2_p = NULL_TREE;
+      *op3_p = NULL_TREE;
     }
   else if (grhs_class == GIMPLE_SINGLE_RHS)
     {
       *op1_p = expr;
       *op2_p = NULL_TREE;
+      *op3_p = NULL_TREE;
     }
   else
     gcc_unreachable ();
@@ -345,10 +354,10 @@  gimple
 gimple_build_assign_stat (tree lhs, tree rhs MEM_STAT_DECL)
 {
   enum tree_code subcode;
-  tree op1, op2;
+  tree op1, op2, op3;
 
-  extract_ops_from_tree (rhs, &subcode, &op1, &op2);
-  return gimple_build_assign_with_ops_stat (subcode, lhs, op1, op2
+  extract_ops_from_tree_1 (rhs, &subcode, &op1, &op2, &op3);
+  return gimple_build_assign_with_ops_stat (subcode, lhs, op1, op2, op3
   					    PASS_MEM_STAT);
 }
 
@@ -359,7 +368,7 @@  gimple_build_assign_stat (tree lhs, tree
 
 gimple
 gimple_build_assign_with_ops_stat (enum tree_code subcode, tree lhs, tree op1,
-                                   tree op2 MEM_STAT_DECL)
+                                   tree op2, tree op3 MEM_STAT_DECL)
 {
   unsigned num_ops;
   gimple p;
@@ -378,6 +387,12 @@  gimple_build_assign_with_ops_stat (enum 
       gimple_assign_set_rhs2 (p, op2);
     }
 
+  if (op3)
+    {
+      gcc_assert (num_ops > 3);
+      gimple_assign_set_rhs3 (p, op3);
+    }
+
   return p;
 }
 
@@ -1955,22 +1970,22 @@  void
 gimple_assign_set_rhs_from_tree (gimple_stmt_iterator *gsi, tree expr)
 {
   enum tree_code subcode;
-  tree op1, op2;
+  tree op1, op2, op3;
 
-  extract_ops_from_tree (expr, &subcode, &op1, &op2);
-  gimple_assign_set_rhs_with_ops (gsi, subcode, op1, op2);
+  extract_ops_from_tree_1 (expr, &subcode, &op1, &op2, &op3);
+  gimple_assign_set_rhs_with_ops_1 (gsi, subcode, op1, op2, op3);
 }
 
 
 /* Set the RHS of assignment statement pointed-to by GSI to CODE with
-   operands OP1 and OP2.
+   operands OP1, OP2 and OP3.
 
    NOTE: The statement pointed-to by GSI may be reallocated if it
    did not have enough operand slots.  */
 
 void
-gimple_assign_set_rhs_with_ops (gimple_stmt_iterator *gsi, enum tree_code code,
-				tree op1, tree op2)
+gimple_assign_set_rhs_with_ops_1 (gimple_stmt_iterator *gsi, enum tree_code code,
+				  tree op1, tree op2, tree op3)
 {
   unsigned new_rhs_ops = get_gimple_rhs_num_ops (code);
   gimple stmt = gsi_stmt (*gsi);
@@ -1994,6 +2009,8 @@  gimple_assign_set_rhs_with_ops (gimple_s
   gimple_assign_set_rhs1 (stmt, op1);
   if (new_rhs_ops > 1)
     gimple_assign_set_rhs2 (stmt, op2);
+  if (new_rhs_ops > 2)
+    gimple_assign_set_rhs3 (stmt, op3);
 }
 
 
@@ -2473,6 +2490,8 @@  get_gimple_rhs_num_ops (enum tree_code c
     return 1;
   else if (rhs_class == GIMPLE_BINARY_RHS)
     return 2;
+  else if (rhs_class == GIMPLE_TERNARY_RHS)
+    return 3;
   else
     gcc_unreachable ();
 }
@@ -2489,6 +2508,8 @@  get_gimple_rhs_num_ops (enum tree_code c
       || (SYM) == TRUTH_OR_EXPR						    \
       || (SYM) == TRUTH_XOR_EXPR) ? GIMPLE_BINARY_RHS			    \
    : (SYM) == TRUTH_NOT_EXPR ? GIMPLE_UNARY_RHS				    \
+   : ((SYM) == WIDEN_MULT_PLUS_EXPR					    \
+      || (SYM) == WIDEN_MULT_MINUS_EXPR) ? GIMPLE_TERNARY_RHS		    \
    : ((SYM) == COND_EXPR						    \
       || (SYM) == CONSTRUCTOR						    \
       || (SYM) == OBJ_TYPE_REF						    \
Index: gimple.h
===================================================================
--- gimple.h	(revision 160997)
+++ gimple.h	(working copy)
@@ -73,6 +73,7 @@  extern void gimple_check_failed (const_g
 enum gimple_rhs_class
 {
   GIMPLE_INVALID_RHS,	/* The expression cannot be used on the RHS.  */
+  GIMPLE_TERNARY_RHS,	/* The expression is a ternary operation.  */
   GIMPLE_BINARY_RHS,	/* The expression is a binary operation.  */
   GIMPLE_UNARY_RHS,	/* The expression is a unary operation.  */
   GIMPLE_SINGLE_RHS	/* The expression is a single object (an SSA
@@ -799,12 +800,14 @@  gimple gimple_build_return (tree);
 gimple gimple_build_assign_stat (tree, tree MEM_STAT_DECL);
 #define gimple_build_assign(l,r) gimple_build_assign_stat (l, r MEM_STAT_INFO)
 
-void extract_ops_from_tree (tree, enum tree_code *, tree *, tree *);
+void extract_ops_from_tree_1 (tree, enum tree_code *, tree *, tree *, tree *);
 
 gimple gimple_build_assign_with_ops_stat (enum tree_code, tree, tree,
-					  tree MEM_STAT_DECL);
-#define gimple_build_assign_with_ops(c,o1,o2,o3) \
-  gimple_build_assign_with_ops_stat (c, o1, o2, o3 MEM_STAT_INFO)
+					  tree, tree MEM_STAT_DECL);
+#define gimple_build_assign_with_ops(c,o1,o2,o3)			\
+  gimple_build_assign_with_ops_stat (c, o1, o2, o3, NULL_TREE MEM_STAT_INFO)
+#define gimple_build_assign_with_ops3(c,o1,o2,o3,o4)			\
+  gimple_build_assign_with_ops_stat (c, o1, o2, o3, o4 MEM_STAT_INFO)
 
 gimple gimple_build_debug_bind_stat (tree, tree, gimple MEM_STAT_DECL);
 #define gimple_build_debug_bind(var,val,stmt)			\
@@ -866,8 +869,8 @@  bool gimple_assign_single_p (gimple);
 bool gimple_assign_unary_nop_p (gimple);
 void gimple_set_bb (gimple, struct basic_block_def *);
 void gimple_assign_set_rhs_from_tree (gimple_stmt_iterator *, tree);
-void gimple_assign_set_rhs_with_ops (gimple_stmt_iterator *, enum tree_code,
-				     tree, tree);
+void gimple_assign_set_rhs_with_ops_1 (gimple_stmt_iterator *, enum tree_code,
+				       tree, tree, tree);
 tree gimple_get_lhs (const_gimple);
 void gimple_set_lhs (gimple, tree);
 void gimple_replace_lhs (gimple, tree);
@@ -1805,6 +1808,63 @@  gimple_assign_set_rhs2 (gimple gs, tree 
   gimple_set_op (gs, 2, rhs);
 }
 
+/* Return the third operand on the RHS of assignment statement GS.
+   If GS does not have two operands, NULL is returned instead.  */
+
+static inline tree
+gimple_assign_rhs3 (const_gimple gs)
+{
+  GIMPLE_CHECK (gs, GIMPLE_ASSIGN);
+
+  if (gimple_num_ops (gs) >= 4)
+    return gimple_op (gs, 3);
+  else
+    return NULL_TREE;
+}
+
+/* Return a pointer to the third operand on the RHS of assignment
+   statement GS.  */
+
+static inline tree *
+gimple_assign_rhs3_ptr (const_gimple gs)
+{
+  GIMPLE_CHECK (gs, GIMPLE_ASSIGN);
+  return gimple_op_ptr (gs, 3);
+}
+
+
+/* Set RHS to be the third operand on the RHS of assignment statement GS.  */
+
+static inline void
+gimple_assign_set_rhs3 (gimple gs, tree rhs)
+{
+  GIMPLE_CHECK (gs, GIMPLE_ASSIGN);
+
+  gimple_set_op (gs, 3, rhs);
+}
+
+/* A wrapper around gimple_assign_set_rhs_with_ops_1, for callers which expect
+   to see only a maximum of two operands.  */
+
+static inline void
+gimple_assign_set_rhs_with_ops (gimple_stmt_iterator *gsi, enum tree_code code,
+				tree op1, tree op2)
+{
+  gimple_assign_set_rhs_with_ops_1 (gsi, code, op1, op2, NULL);
+}
+
+/* A wrapper around extract_ops_from_tree_1, for callers which expect
+   to see only a maximum of two operands.  */
+
+static inline void
+extract_ops_from_tree (tree expr, enum tree_code *code, tree *op0,
+		       tree *op1)
+{
+  tree op2;
+  extract_ops_from_tree_1 (expr, code, op0, op1, &op2);
+  gcc_assert (op2 == NULL_TREE);
+}
+
 /* Returns true if GS is a nontemporal move.  */
 
 static inline bool
Index: tree-cfg.c
===================================================================
--- tree-cfg.c	(revision 160997)
+++ tree-cfg.c	(working copy)
@@ -3533,6 +3533,67 @@  do_pointer_plus_expr_check:
   return false;
 }
 
+/* Verify a gimple assignment statement STMT with a binary rhs.
+   Returns true if anything is wrong.  */
+
+static bool
+verify_gimple_assign_ternary (gimple stmt)
+{
+  enum tree_code rhs_code = gimple_assign_rhs_code (stmt);
+  tree lhs = gimple_assign_lhs (stmt);
+  tree lhs_type = TREE_TYPE (lhs);
+  tree rhs1 = gimple_assign_rhs1 (stmt);
+  tree rhs1_type = TREE_TYPE (rhs1);
+  tree rhs2 = gimple_assign_rhs2 (stmt);
+  tree rhs2_type = TREE_TYPE (rhs2);
+  tree rhs3 = gimple_assign_rhs3 (stmt);
+  tree rhs3_type = TREE_TYPE (rhs3);
+
+  if (!is_gimple_reg (lhs)
+      && !(optimize == 0
+	   && TREE_CODE (lhs_type) == COMPLEX_TYPE))
+    {
+      error ("non-register as LHS of ternary operation");
+      return true;
+    }
+
+  if (!is_gimple_val (rhs1)
+      || !is_gimple_val (rhs2)
+      || !is_gimple_val (rhs3))
+    {
+      error ("invalid operands in ternary operation");
+      return true;
+    }
+
+  /* First handle operations that involve different types.  */
+  switch (rhs_code)
+    {
+    case WIDEN_MULT_PLUS_EXPR:
+    case WIDEN_MULT_MINUS_EXPR:
+      if ((!INTEGRAL_TYPE_P (rhs1_type)
+	   && !FIXED_POINT_TYPE_P (rhs1_type)
+	   && !(TREE_CODE (rhs1_type) == VECTOR_TYPE
+		&& INTEGRAL_TYPE_P (TREE_TYPE (rhs1_type))))
+	  || !useless_type_conversion_p (rhs1_type, rhs2_type)
+	  || !useless_type_conversion_p (lhs_type, rhs3_type)
+	  || 2 * TYPE_PRECISION (rhs1_type) != TYPE_PRECISION (lhs_type)
+	  || TYPE_PRECISION (rhs1_type) != TYPE_PRECISION (rhs2_type))
+	{
+	  error ("type mismatch in widening multiply-accumulate expression");
+	  debug_generic_expr (lhs_type);
+	  debug_generic_expr (rhs1_type);
+	  debug_generic_expr (rhs2_type);
+	  debug_generic_expr (rhs3_type);
+	  return true;
+	}
+      break;
+
+    default:
+      gcc_unreachable ();
+    }
+  return false;
+}
+
 /* Verify a gimple assignment statement STMT with a single rhs.
    Returns true if anything is wrong.  */
 
@@ -3679,6 +3740,9 @@  verify_gimple_assign (gimple stmt)
     case GIMPLE_BINARY_RHS:
       return verify_gimple_assign_binary (stmt);
 
+    case GIMPLE_TERNARY_RHS:
+      return verify_gimple_assign_ternary (stmt);
+
     default:
       gcc_unreachable ();
     }
Index: config/arm/arm.md
===================================================================
--- config/arm/arm.md	(revision 160997)
+++ config/arm/arm.md	(working copy)
@@ -1422,7 +1422,15 @@  (define_insn "*mulsi3subsi"
    (set_attr "predicable" "yes")]
 )
 
-;; Unnamed template to match long long multiply-accumulate (smlal)
+(define_expand "maddsidi4"
+  [(set (match_operand:DI 0 "s_register_operand" "")
+	(plus:DI
+	 (mult:DI
+	  (sign_extend:DI (match_operand:SI 1 "s_register_operand" ""))
+	  (sign_extend:DI (match_operand:SI 2 "s_register_operand" "")))
+	 (match_operand:DI 3 "s_register_operand" "")))]
+  "TARGET_32BIT && arm_arch3m"
+  "")
 
 (define_insn "*mulsidi3adddi"
   [(set (match_operand:DI 0 "s_register_operand" "=&r")
@@ -1518,7 +1526,15 @@  (define_insn "*umulsidi3_v6"
    (set_attr "predicable" "yes")]
 )
 
-;; Unnamed template to match long long unsigned multiply-accumulate (umlal)
+(define_expand "umaddsidi4"
+  [(set (match_operand:DI 0 "s_register_operand" "")
+	(plus:DI
+	 (mult:DI
+	  (zero_extend:DI (match_operand:SI 1 "s_register_operand" ""))
+	  (zero_extend:DI (match_operand:SI 2 "s_register_operand" "")))
+	 (match_operand:DI 3 "s_register_operand" "")))]
+  "TARGET_32BIT && arm_arch3m"
+  "")
 
 (define_insn "*umulsidi3adddi"
   [(set (match_operand:DI 0 "s_register_operand" "=&r")
@@ -1686,29 +1702,29 @@  (define_insn "*mulhisi3tt"
    (set_attr "predicable" "yes")]
 )
 
-(define_insn "*mulhisi3addsi"
+(define_insn "maddhisi4"
   [(set (match_operand:SI 0 "s_register_operand" "=r")
-	(plus:SI (match_operand:SI 1 "s_register_operand" "r")
+	(plus:SI (match_operand:SI 3 "s_register_operand" "r")
 		 (mult:SI (sign_extend:SI
-			   (match_operand:HI 2 "s_register_operand" "%r"))
+			   (match_operand:HI 1 "s_register_operand" "%r"))
 			  (sign_extend:SI
-			   (match_operand:HI 3 "s_register_operand" "r")))))]
+			   (match_operand:HI 2 "s_register_operand" "r")))))]
   "TARGET_DSP_MULTIPLY"
-  "smlabb%?\\t%0, %2, %3, %1"
+  "smlabb%?\\t%0, %1, %2, %3"
   [(set_attr "insn" "smlaxy")
    (set_attr "predicable" "yes")]
 )
 
-(define_insn "*mulhidi3adddi"
+(define_insn "*maddhidi4"
   [(set (match_operand:DI 0 "s_register_operand" "=r")
 	(plus:DI
-	  (match_operand:DI 1 "s_register_operand" "0")
+	  (match_operand:DI 3 "s_register_operand" "0")
 	  (mult:DI (sign_extend:DI
-	 	    (match_operand:HI 2 "s_register_operand" "%r"))
+	 	    (match_operand:HI 1 "s_register_operand" "%r"))
 		   (sign_extend:DI
-		    (match_operand:HI 3 "s_register_operand" "r")))))]
+		    (match_operand:HI 2 "s_register_operand" "r")))))]
   "TARGET_DSP_MULTIPLY"
-  "smlalbb%?\\t%Q0, %R0, %2, %3"
+  "smlalbb%?\\t%Q0, %R0, %1, %2"
   [(set_attr "insn" "smlalxy")
    (set_attr "predicable" "yes")])
 
Index: tree-ssa-operands.c
===================================================================
--- tree-ssa-operands.c	(revision 160997)
+++ tree-ssa-operands.c	(working copy)
@@ -988,6 +988,8 @@  get_expr_operands (gimple stmt, tree *ex
 
     case DOT_PROD_EXPR:
     case REALIGN_LOAD_EXPR:
+    case WIDEN_MULT_PLUS_EXPR:
+    case WIDEN_MULT_MINUS_EXPR:
       {
 	get_expr_operands (stmt, &TREE_OPERAND (expr, 0), flags);
         get_expr_operands (stmt, &TREE_OPERAND (expr, 1), flags);