Patchwork [(3/7)] Widening multiply-and-accumulate pattern matching

login
register
mail settings
Submitter Andrew Stubbs
Date July 4, 2011, 2:23 p.m.
Message ID <4E11CCD1.4010505@codesourcery.com>
Download mbox | patch
Permalink /patch/103110/
State New
Headers show

Comments

Andrew Stubbs - July 4, 2011, 2:23 p.m.
On 01/07/11 13:25, Richard Guenther wrote:
> Well - some operations work the same on both signedness if you
> just care about the twos-complement result.  This includes
> multiplication (but not for example division).  For this special
> case I suggest to not bother trying to invent a generic predicate
> but do something local in tree-ssa-math-opts.c.

OK, here's my updated patch.

I've taken the view that we *know* what size and signedness the result 
of the multiplication is, and we know what size the input to the 
addition must be, so all the check has to do is make sure it does that 
same conversion, even if by a roundabout means.

What I hadn't grasped before is that when extending a value it's the 
source type that is significant, not the destination, so the checks are 
not as complex as I had thought.

So, this patch adds a test to ensure that:

  1. the type is not truncated so far that we lose any information; and

  2. the type is only ever extended in the proper signedness.

Also, just to be absolutely sure, I've also added a little bit of logic 
to permit extends that are then undone by a truncate. I'm really not 
sure what guarantees there are about what sort of cast sequences can 
exist? Is this necessary? I haven't managed to coax it to generated any 
examples of extends followed by truncates myself, but in any case, it's 
hardly any code and it'll make sure it's future proofed.

OK?

Andrew
Richard Guenther - July 7, 2011, 9:58 a.m.
On Mon, Jul 4, 2011 at 4:23 PM, Andrew Stubbs <ams@codesourcery.com> wrote:
> On 01/07/11 13:25, Richard Guenther wrote:
>>
>> Well - some operations work the same on both signedness if you
>> just care about the twos-complement result.  This includes
>> multiplication (but not for example division).  For this special
>> case I suggest to not bother trying to invent a generic predicate
>> but do something local in tree-ssa-math-opts.c.
>
> OK, here's my updated patch.
>
> I've taken the view that we *know* what size and signedness the result of
> the multiplication is, and we know what size the input to the addition must
> be, so all the check has to do is make sure it does that same conversion,
> even if by a roundabout means.
>
> What I hadn't grasped before is that when extending a value it's the source
> type that is significant, not the destination, so the checks are not as
> complex as I had thought.
>
> So, this patch adds a test to ensure that:
>
>  1. the type is not truncated so far that we lose any information; and
>
>  2. the type is only ever extended in the proper signedness.
>
> Also, just to be absolutely sure, I've also added a little bit of logic to
> permit extends that are then undone by a truncate. I'm really not sure what
> guarantees there are about what sort of cast sequences can exist? Is this
> necessary? I haven't managed to coax it to generated any examples of extends
> followed by truncates myself, but in any case, it's hardly any code and
> it'll make sure it's future proofed.
>
> OK?

I think you should assume that series of widenings, (int)(short)char_variable
are already combined.  Thus I believe you only need to consider a single
conversion in valid_types_for_madd_p.

+/* Check the input types, TYPE1 and TYPE2 to a widening multiply,

what are those types?  Is TYPE1 the result type and TYPE2 the
operand type?  If so why

+  initial_bitsize = TYPE_PRECISION (type1) + TYPE_PRECISION (type2);

this?!

+  initial_unsigned = TYPE_UNSIGNED (type1) && TYPE_UNSIGNED (type2);

that also looks odd.  So probably TYPE1 isn't the result type.  If they
are the types of the operands, then what operand is EXPR for?

I didn't look at the actual implementation of the function because of the
lack of understanding of the inputs.

-  if (TREE_CODE (rhs1) == SSA_NAME)
+  for (tmp = rhs1, rhs1_code = ERROR_MARK;
+       TREE_CODE (tmp) == SSA_NAME
+       && (CONVERT_EXPR_CODE_P (rhs1_code) || rhs1_code == ERROR_MARK);
+       tmp = gimple_assign_rhs1 (rhs1_stmt))
     {
-      rhs1_stmt = SSA_NAME_DEF_STMT (rhs1);
-      if (is_gimple_assign (rhs1_stmt))
-       rhs1_code = gimple_assign_rhs_code (rhs1_stmt);
+      rhs1_stmt = SSA_NAME_DEF_STMT (tmp);
+      if (!is_gimple_assign (rhs1_stmt))
+       break;
+      rhs1_code = gimple_assign_rhs_code (rhs1_stmt);
     }

the result looks a bit like spaghetti code ... and lacks a comment
on what it is trying to do.  It looks like it sees through an arbitrary
number of conversions - possibly ones that will make the
macc invalid, as for (short)int-var * short-var + int-var.  So you'll
be pessimizing code by doing that unconditionally.  As I said
above you should at most consider one intermediate conversion.

I believe the code should be arranged such that only valid
conversions are looked through in the first place.  Valid, in
that the resulting types should still match the macc constraints.

Richard.

> Andrew
>
Andrew Stubbs - July 7, 2011, 10:26 a.m.
On 07/07/11 10:58, Richard Guenther wrote:
> I think you should assume that series of widenings, (int)(short)char_variable
> are already combined.  Thus I believe you only need to consider a single
> conversion in valid_types_for_madd_p.

Hmm, I'm not so sure. I'll look into it a bit further.

> +/* Check the input types, TYPE1 and TYPE2 to a widening multiply,
>
> what are those types?  Is TYPE1 the result type and TYPE2 the
> operand type?  If so why

TYPE1 and TYPE2 are the inputs to the multiply. I thought I explained 
that in the comment before the function.

> +  initial_bitsize = TYPE_PRECISION (type1) + TYPE_PRECISION (type2);
>
> this?!

The result of the multiply will be this many bits wide. This may be 
narrower than the type that holds it.

E.g., 16-bit * 8-bit gives a result at most 24-bits wide, which will 
usually be held in a 32- or 64-bit variable.

> +  initial_unsigned = TYPE_UNSIGNED (type1)&&  TYPE_UNSIGNED (type2);
>
> that also looks odd.  So probably TYPE1 isn't the result type.  If they
> are the types of the operands, then what operand is EXPR for?

EXPR, as the comment says, is the addition that follows the multiply.

> -  if (TREE_CODE (rhs1) == SSA_NAME)
> +  for (tmp = rhs1, rhs1_code = ERROR_MARK;
> +       TREE_CODE (tmp) == SSA_NAME
> +&&  (CONVERT_EXPR_CODE_P (rhs1_code) || rhs1_code == ERROR_MARK);
> +       tmp = gimple_assign_rhs1 (rhs1_stmt))
>       {
> -      rhs1_stmt = SSA_NAME_DEF_STMT (rhs1);
> -      if (is_gimple_assign (rhs1_stmt))
> -       rhs1_code = gimple_assign_rhs_code (rhs1_stmt);
> +      rhs1_stmt = SSA_NAME_DEF_STMT (tmp);
> +      if (!is_gimple_assign (rhs1_stmt))
> +       break;
> +      rhs1_code = gimple_assign_rhs_code (rhs1_stmt);
>       }
>
> the result looks a bit like spaghetti code ... and lacks a comment
> on what it is trying to do.  It looks like it sees through an arbitrary
> number of conversions - possibly ones that will make the
> macc invalid, as for (short)int-var * short-var + int-var.  So you'll
> be pessimizing code by doing that unconditionally.  As I said
> above you should at most consider one intermediate conversion.

Ok, I need to add a comment here. The code does indeed look back through 
an arbitrary number of conversions. It is searching for the last real 
operation before the addition, hoping to find a multiply.

> I believe the code should be arranged such that only valid
> conversions are looked through in the first place.  Valid, in
> that the resulting types should still match the macc constraints.

Well, it might be possible to discard some conversions initially, but 
until the multiply is found, and it's input types are known, we can't 
know for certain what conversions are valid.

I think I need to explain what's going on here more clearly.

   1. It finds an addition statement. It's not known yet whether it is 
part of a multiply-and-accumulate, or not.

   2. It follows the conversion chain back from each operand to see if 
it finds a multiply, or widening multiply statement.

   3. If it finds a non-widening multiply, it checks it to see if it 
could be widening multiply-and-accumulate (it will already have been 
rejected as a widening multiply on it's own, but the addition might be 
in a wider mode, or the target might provide multiply-and-accumulate 
insns that don't have corresponding widening multiply insns).

   4. (This is the new bit!) It looks to see if there are any 
conversions between the multiply and addition that can safely be ignored.

   5. If we get here, then emit any necessary conversion statements, and 
convert the addition to a WIDEN_MULT_PLUS_EXPR.

Before these changes, any conversion between the multiply and addition 
statements would prevent optimization, even though there are many cases 
where the conversions are valid, and even inserted automatically.

I'm going to go away and find out whether there are really any cases 
where there can legitimately be more than one conversion, and at least 
update my patch with better commenting.

Thanks for you review.

Andrew
Andrew Stubbs - July 7, 2011, 11:43 a.m.
On 07/07/11 11:26, Andrew Stubbs wrote:
> On 07/07/11 10:58, Richard Guenther wrote:
>> I think you should assume that series of widenings,
>> (int)(short)char_variable
>> are already combined.  Thus I believe you only need to consider a single
>> conversion in valid_types_for_madd_p.
>
> Hmm, I'm not so sure. I'll look into it a bit further.

OK, here's a test case that gives multiple conversions:

   long long
   foo (long long a, signed char b, signed char c)
   {
     int bc = b * c;
     return a + (short)bc;
   }

The dump right before the widen_mult pass gives:

   foo (long long int a, signed char b, signed char c)
   {
     int bc;
     long long int D.2018;
     short int D.2017;
     long long int D.2016;
     int D.2015;
     int D.2014;

   <bb 2>:
     D.2014_2 = (int) b_1(D);
     D.2015_4 = (int) c_3(D);
     bc_5 = D.2014_2 * D.2015_4;
     D.2017_6 = (short int) bc_5;
     D.2018_7 = (long long int) D.2017_6;
     D.2016_9 = D.2018_7 + a_8(D);
     return D.2016_9;

   }

Here we have a multiply and accumulate done the long way. The 8-bit 
inputs are widened to 32-bit, multiplied to give a 32-bit result (of 
which only the lower 16-bits contain meaningful data), then truncated to 
16-bits, and sign-extended up to 64-bits ready for the 64-bit addition.

This is slight contrived, perhaps, but not unlike the sort of thing that 
might occur when you have inline functions and macros, and most 
importantly - it is mathematically valid!


So, here's the output from my patched widen_mult pass:

   foo (long long int a, signed char b, signed char c)
   {
     int bc;
     long long int D.2018;
     short int D.2017;
     long long int D.2016;
     int D.2015;
     int D.2014;

   <bb 2>:
     D.2014_2 = (int) b_1(D);
     D.2015_4 = (int) c_3(D);
     bc_5 = b_1(D) w* c_3(D);
     D.2017_6 = (short int) bc_5;
     D.2018_7 = (long long int) D.2017_6;
     D.2016_9 = WIDEN_MULT_PLUS_EXPR <b_1(D), c_3(D), a_8(D)>;
     return D.2016_9;

   }

As you can see, everything except the WIDEN_MULT_PLUS_EXPR statement is 
now redundant. (Ideally, this would be removed now, but in fact it 
doesn't get eliminated until the RTL into_cfglayout pass. This is not 
new behaviour.)


My point is that it's possible to have at least two conversions to 
examine. Is it possible to have more? I don't know, but once I'm dealing 
with two I might as well deal with an arbitrary number.

Andrew
Richard Guenther - July 7, 2011, 12:28 p.m.
On Thu, Jul 7, 2011 at 1:43 PM, Andrew Stubbs <andrew.stubbs@gmail.com> wrote:
> On 07/07/11 11:26, Andrew Stubbs wrote:
>>
>> On 07/07/11 10:58, Richard Guenther wrote:
>>>
>>> I think you should assume that series of widenings,
>>> (int)(short)char_variable
>>> are already combined.  Thus I believe you only need to consider a single
>>> conversion in valid_types_for_madd_p.
>>
>> Hmm, I'm not so sure. I'll look into it a bit further.
>
> OK, here's a test case that gives multiple conversions:
>
>  long long
>  foo (long long a, signed char b, signed char c)
>  {
>    int bc = b * c;
>    return a + (short)bc;
>  }
>
> The dump right before the widen_mult pass gives:
>
>  foo (long long int a, signed char b, signed char c)
>  {
>    int bc;
>    long long int D.2018;
>    short int D.2017;
>    long long int D.2016;
>    int D.2015;
>    int D.2014;
>
>  <bb 2>:
>    D.2014_2 = (int) b_1(D);
>    D.2015_4 = (int) c_3(D);
>    bc_5 = D.2014_2 * D.2015_4;
>    D.2017_6 = (short int) bc_5;

Ok, so you have a truncation that is a no-op value-wise.  I would
argue that this truncation should be removed independent on
whether we have a widening multiply instruction or not.

The technically most capable place to remove non-value-changing
truncations (and combine them with a successive conversion)
would be value-range propagation.  Which already knows:

Value ranges after VRP:

b_1(D): VARYING
D.2698_2: [-128, 127]
c_3(D): VARYING
D.2699_4: [-128, 127]
bc_5: [-16256, 16384]
D.2701_6: [-16256, 16384]
D.2702_7: [-16256, 16384]
a_8(D): VARYING
D.2700_9: VARYING

thus truncating bc_5 to short does not change the value.

The simplification could be made when looking at the
statement

>    D.2018_7 = (long long int) D.2017_6;

in vrp_fold_stmt, based on the fact that this conversion
converts from a value-preserving intermediate conversion.
Thus the transform would replace the D.2017_6 operand
with bc_5.

So yes, the case appears - but it shouldn't ;)

I'll cook up a quick patch for VRP.

Thanks,
Richard.

>    D.2016_9 = D.2018_7 + a_8(D);
>    return D.2016_9;
>
>  }
>
> Here we have a multiply and accumulate done the long way. The 8-bit inputs
> are widened to 32-bit, multiplied to give a 32-bit result (of which only the
> lower 16-bits contain meaningful data), then truncated to 16-bits, and
> sign-extended up to 64-bits ready for the 64-bit addition.
>
> This is slight contrived, perhaps, but not unlike the sort of thing that
> might occur when you have inline functions and macros, and most importantly
> - it is mathematically valid!
>
>
> So, here's the output from my patched widen_mult pass:
>
>  foo (long long int a, signed char b, signed char c)
>  {
>    int bc;
>    long long int D.2018;
>    short int D.2017;
>    long long int D.2016;
>    int D.2015;
>    int D.2014;
>
>  <bb 2>:
>    D.2014_2 = (int) b_1(D);
>    D.2015_4 = (int) c_3(D);
>    bc_5 = b_1(D) w* c_3(D);
>    D.2017_6 = (short int) bc_5;
>    D.2018_7 = (long long int) D.2017_6;
>    D.2016_9 = WIDEN_MULT_PLUS_EXPR <b_1(D), c_3(D), a_8(D)>;
>    return D.2016_9;
>
>  }
>
> As you can see, everything except the WIDEN_MULT_PLUS_EXPR statement is now
> redundant. (Ideally, this would be removed now, but in fact it doesn't get
> eliminated until the RTL into_cfglayout pass. This is not new behaviour.)
>
>
> My point is that it's possible to have at least two conversions to examine.
> Is it possible to have more? I don't know, but once I'm dealing with two I
> might as well deal with an arbitrary number.
>
> Andrew
>
Richard Guenther - July 7, 2011, 12:37 p.m.
On Thu, Jul 7, 2011 at 2:28 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Thu, Jul 7, 2011 at 1:43 PM, Andrew Stubbs <andrew.stubbs@gmail.com> wrote:
>> On 07/07/11 11:26, Andrew Stubbs wrote:
>>>
>>> On 07/07/11 10:58, Richard Guenther wrote:
>>>>
>>>> I think you should assume that series of widenings,
>>>> (int)(short)char_variable
>>>> are already combined.  Thus I believe you only need to consider a single
>>>> conversion in valid_types_for_madd_p.
>>>
>>> Hmm, I'm not so sure. I'll look into it a bit further.
>>
>> OK, here's a test case that gives multiple conversions:
>>
>>  long long
>>  foo (long long a, signed char b, signed char c)
>>  {
>>    int bc = b * c;
>>    return a + (short)bc;
>>  }
>>
>> The dump right before the widen_mult pass gives:
>>
>>  foo (long long int a, signed char b, signed char c)
>>  {
>>    int bc;
>>    long long int D.2018;
>>    short int D.2017;
>>    long long int D.2016;
>>    int D.2015;
>>    int D.2014;
>>
>>  <bb 2>:
>>    D.2014_2 = (int) b_1(D);
>>    D.2015_4 = (int) c_3(D);
>>    bc_5 = D.2014_2 * D.2015_4;
>>    D.2017_6 = (short int) bc_5;
>
> Ok, so you have a truncation that is a no-op value-wise.  I would
> argue that this truncation should be removed independent on
> whether we have a widening multiply instruction or not.
>
> The technically most capable place to remove non-value-changing
> truncations (and combine them with a successive conversion)
> would be value-range propagation.  Which already knows:
>
> Value ranges after VRP:
>
> b_1(D): VARYING
> D.2698_2: [-128, 127]
> c_3(D): VARYING
> D.2699_4: [-128, 127]
> bc_5: [-16256, 16384]
> D.2701_6: [-16256, 16384]
> D.2702_7: [-16256, 16384]
> a_8(D): VARYING
> D.2700_9: VARYING
>
> thus truncating bc_5 to short does not change the value.
>
> The simplification could be made when looking at the
> statement
>
>>    D.2018_7 = (long long int) D.2017_6;
>
> in vrp_fold_stmt, based on the fact that this conversion
> converts from a value-preserving intermediate conversion.
> Thus the transform would replace the D.2017_6 operand
> with bc_5.
>
> So yes, the case appears - but it shouldn't ;)
>
> I'll cook up a quick patch for VRP.

Like the attached.  I'll finish and properly test it.

Richard.
Andrew Stubbs - July 8, 2011, 12:44 p.m.
On 07/07/11 13:37, Richard Guenther wrote:
>> I'll cook up a quick patch for VRP.
>
> Like the attached.  I'll finish and properly test it.

Your patch appears to do the wrong thing for this test case:

int
foo (int a, short b, short c)
{
   int bc = b * c;
   return a + (short)bc;
}

With your patch, the input to the widening-mult pass now looks like this:

foo (int a, short int b, short int c)
{
   int bc;
   int D.2016;
   int D.2015;
   int D.2014;

<bb 2>:
   D.2014_2 = (int) b_1(D);
   D.2015_4 = (int) c_3(D);
   bc_5 = D.2014_2 * D.2015_4;
   D.2016_9 = bc_5 + a_8(D);
   return D.2016_9;

}

It looks like when the user tries to deliberately break the maths your 
patch seems to unbreak it.

Andrew
Richard Guenther - July 8, 2011, 1:21 p.m.
On Fri, Jul 8, 2011 at 2:44 PM, Andrew Stubbs <ams@codesourcery.com> wrote:
> On 07/07/11 13:37, Richard Guenther wrote:
>>>
>>> I'll cook up a quick patch for VRP.
>>
>> Like the attached.  I'll finish and properly test it.
>
> Your patch appears to do the wrong thing for this test case:
>
> int
> foo (int a, short b, short c)
> {
>  int bc = b * c;
>  return a + (short)bc;
> }
>
> With your patch, the input to the widening-mult pass now looks like this:
>
> foo (int a, short int b, short int c)
> {
>  int bc;
>  int D.2016;
>  int D.2015;
>  int D.2014;
>
> <bb 2>:
>  D.2014_2 = (int) b_1(D);
>  D.2015_4 = (int) c_3(D);
>  bc_5 = D.2014_2 * D.2015_4;
>  D.2016_9 = bc_5 + a_8(D);
>  return D.2016_9;
>
> }
>
> It looks like when the user tries to deliberately break the maths your patch
> seems to unbreak it.

Yeah, I fixed that in the checked in version.

Richard.

> Andrew
>

Patch

2011-06-28  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* tree-ssa-math-opts.c (valid_types_for_madd_p): New function.
	(convert_plusminus_to_widen): Use valid_types_for_madd_p to
	identify optimization candidates.

	gcc/testsuite/
	* gcc.target/arm/wmul-5.c: New file.
	* gcc.target/arm/no-wmla-1.c: New file.

---
 .../gcc/testsuite/gcc.target/arm/no-wmla-1.c       |   11 ++
 .../gcc/testsuite/gcc.target/arm/wmul-5.c          |   10 ++
 src/gcc-mainline/gcc/tree-ssa-math-opts.c          |  112 ++++++++++++++++++--
 3 files changed, 123 insertions(+), 10 deletions(-)
 create mode 100644 src/gcc-mainline/gcc/testsuite/gcc.target/arm/no-wmla-1.c
 create mode 100644 src/gcc-mainline/gcc/testsuite/gcc.target/arm/wmul-5.c

diff --git a/src/gcc-mainline/gcc/testsuite/gcc.target/arm/no-wmla-1.c b/src/gcc-mainline/gcc/testsuite/gcc.target/arm/no-wmla-1.c
new file mode 100644
index 0000000..17f7427
--- /dev/null
+++ b/src/gcc-mainline/gcc/testsuite/gcc.target/arm/no-wmla-1.c
@@ -0,0 +1,11 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+int
+foo (int a, short b, short c)
+{
+     int bc = b * c;
+        return a + (short)bc;
+}
+
+/* { dg-final { scan-assembler "mul" } } */
diff --git a/src/gcc-mainline/gcc/testsuite/gcc.target/arm/wmul-5.c b/src/gcc-mainline/gcc/testsuite/gcc.target/arm/wmul-5.c
new file mode 100644
index 0000000..65c43e3
--- /dev/null
+++ b/src/gcc-mainline/gcc/testsuite/gcc.target/arm/wmul-5.c
@@ -0,0 +1,10 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+long long
+foo (long long a, char *b, char *c)
+{
+  return a + *b * *c;
+}
+
+/* { dg-final { scan-assembler "umlal" } } */
diff --git a/src/gcc-mainline/gcc/tree-ssa-math-opts.c b/src/gcc-mainline/gcc/tree-ssa-math-opts.c
index d55ba57..5ef7bb4 100644
--- a/src/gcc-mainline/gcc/tree-ssa-math-opts.c
+++ b/src/gcc-mainline/gcc/tree-ssa-math-opts.c
@@ -2085,6 +2085,78 @@  convert_mult_to_widen (gimple stmt)
   return true;
 }
 
+/* Check the input types, TYPE1 and TYPE2 to a widening multiply,
+   and then the convertions between the output of the multiply, and
+   the input to an addition EXPR, to ensure that they are compatible with
+   a widening multiply-and-accumulate.
+
+   This function assumes that expr is a valid string of conversion expressions
+   terminated by a multiplication.
+
+   This function tries NOT to make any (fragile) assumptions about what
+   sequence of conversions can exist in the input.  */
+
+static bool
+valid_types_for_madd_p (tree type1, tree type2, tree expr)
+{
+  gimple stmt, prev_stmt;
+  enum tree_code code, prev_code;
+  tree prev_expr, type, prev_type;
+  int bitsize, prev_bitsize, initial_bitsize, min_bitsize;
+  bool initial_unsigned;
+
+  initial_bitsize = TYPE_PRECISION (type1) + TYPE_PRECISION (type2);
+  initial_unsigned = TYPE_UNSIGNED (type1) && TYPE_UNSIGNED (type2);
+
+  stmt = SSA_NAME_DEF_STMT (expr);
+  code = gimple_assign_rhs_code (stmt);
+  type = TREE_TYPE (expr);
+  bitsize = TYPE_PRECISION (type);
+  min_bitsize = bitsize;
+
+  if (code == MULT_EXPR || code == WIDEN_MULT_EXPR)
+    return true;
+
+  if (!INTEGRAL_TYPE_P (type)
+      || TYPE_PRECISION (type) < initial_bitsize)
+    return false;
+
+  /* Step through the conversions backwards.  */
+  while (true)
+    {
+      prev_expr = gimple_assign_rhs1 (stmt);
+      prev_stmt = SSA_NAME_DEF_STMT (prev_expr);
+      prev_code = gimple_assign_rhs_code (prev_stmt);
+      prev_type = TREE_TYPE (prev_expr);
+      prev_bitsize = TYPE_PRECISION (prev_type);
+
+      if (prev_code == MULT_EXPR || prev_code == WIDEN_MULT_EXPR)
+	break;
+
+      /* If it's an unsuitable type or a truncate that damages the
+	 original value, then were done.  */
+      if (!INTEGRAL_TYPE_P (prev_type)
+	  || TYPE_PRECISION (prev_type) < initial_bitsize)
+	return false;
+
+      /* If we have the wrong sort of extend for the value, then it
+	 could still be ok if we already saw a truncate that reverses
+	 the effect.  */
+      if (bitsize > prev_bitsize
+	  && TYPE_UNSIGNED (prev_type) != initial_unsigned
+	  && min_bitsize > prev_bitsize)
+	return false;
+
+      stmt = prev_stmt;
+      code = prev_code;
+      type = prev_type;
+      bitsize = prev_bitsize;
+      min_bitsize = bitsize < min_bitsize ? bitsize : min_bitsize;
+    }
+
+  return true;
+}
+
 /* Process a single gimple statement STMT, which is found at the
    iterator GSI and has a either a PLUS_EXPR or a MINUS_EXPR as its
    rhs (given by CODE), and try to convert it into a
@@ -2098,6 +2170,7 @@  convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   gimple rhs1_stmt = NULL, rhs2_stmt = NULL;
   tree type, type1, type2;
   tree lhs, rhs1, rhs2, mult_rhs1, mult_rhs2, add_rhs;
+  tree tmp, mult_rhs;
   enum tree_code rhs1_code = ERROR_MARK, rhs2_code = ERROR_MARK;
   optab this_optab;
   enum tree_code wmult_code;
@@ -2117,22 +2190,32 @@  convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   rhs1 = gimple_assign_rhs1 (stmt);
   rhs2 = gimple_assign_rhs2 (stmt);
 
-  if (TREE_CODE (rhs1) == SSA_NAME)
+  for (tmp = rhs1, rhs1_code = ERROR_MARK;
+       TREE_CODE (tmp) == SSA_NAME
+       && (CONVERT_EXPR_CODE_P (rhs1_code) || rhs1_code == ERROR_MARK);
+       tmp = gimple_assign_rhs1 (rhs1_stmt))
     {
-      rhs1_stmt = SSA_NAME_DEF_STMT (rhs1);
-      if (is_gimple_assign (rhs1_stmt))
-	rhs1_code = gimple_assign_rhs_code (rhs1_stmt);
+      rhs1_stmt = SSA_NAME_DEF_STMT (tmp);
+      if (!is_gimple_assign (rhs1_stmt))
+	break;
+      rhs1_code = gimple_assign_rhs_code (rhs1_stmt);
     }
-  else
+
+  if (TREE_CODE (tmp) != SSA_NAME)
     return false;
 
-  if (TREE_CODE (rhs2) == SSA_NAME)
+  for (tmp = rhs2, rhs2_code = ERROR_MARK;
+       TREE_CODE (tmp) == SSA_NAME
+       && (CONVERT_EXPR_CODE_P (rhs2_code) || rhs2_code == ERROR_MARK);
+       tmp = gimple_assign_rhs1 (rhs2_stmt))
     {
-      rhs2_stmt = SSA_NAME_DEF_STMT (rhs2);
-      if (is_gimple_assign (rhs2_stmt))
-	rhs2_code = gimple_assign_rhs_code (rhs2_stmt);
+      rhs2_stmt = SSA_NAME_DEF_STMT (tmp);
+      if (!is_gimple_assign (rhs2_stmt))
+	break;
+      rhs2_code = gimple_assign_rhs_code (rhs2_stmt);
     }
-  else
+
+  if (TREE_CODE (tmp) != SSA_NAME)
     return false;
 
   if (code == PLUS_EXPR && rhs1_code == MULT_EXPR)
@@ -2140,6 +2223,7 @@  convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
       if (!is_widening_mult_p (rhs1_stmt, &type1, &mult_rhs1,
 			       &type2, &mult_rhs2))
 	return false;
+      mult_rhs = rhs1;
       add_rhs = rhs2;
     }
   else if (rhs2_code == MULT_EXPR)
@@ -2147,6 +2231,7 @@  convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
       if (!is_widening_mult_p (rhs2_stmt, &type1, &mult_rhs1,
 			       &type2, &mult_rhs2))
 	return false;
+      mult_rhs = rhs2;
       add_rhs = rhs1;
     }
   else if (code == PLUS_EXPR && rhs1_code == WIDEN_MULT_EXPR)
@@ -2155,6 +2240,7 @@  convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
       mult_rhs2 = gimple_assign_rhs2 (rhs1_stmt);
       type1 = TREE_TYPE (mult_rhs1);
       type2 = TREE_TYPE (mult_rhs2);
+      mult_rhs = rhs1;
       add_rhs = rhs2;
     }
   else if (rhs2_code == WIDEN_MULT_EXPR)
@@ -2163,6 +2249,7 @@  convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
       mult_rhs2 = gimple_assign_rhs2 (rhs2_stmt);
       type1 = TREE_TYPE (mult_rhs1);
       type2 = TREE_TYPE (mult_rhs2);
+      mult_rhs = rhs2;
       add_rhs = rhs1;
     }
   else
@@ -2171,6 +2258,11 @@  convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   if (TYPE_UNSIGNED (type1) != TYPE_UNSIGNED (type2))
     return false;
 
+  /* Verify that the convertions between the mult and the add doesn't do
+     anything unexpected.  */
+  if (!valid_types_for_madd_p (type1, type2, mult_rhs))
+    return false;
+
   /* Verify that the machine can perform a widening multiply
      accumulate in this mode/signedness combination, otherwise
      this transformation is likely to pessimize code.  */