Message ID | 502E4F54.9040309@arm.com |
---|---|
State | New |
Headers | show |
On 17/08/12 15:04, Richard Earnshaw wrote: > The fix is to make is_widening_mult_p note that it has been called with > a WIDEN_MULT_EXPR and rather than decompose the operands again, to > simply extract the existing operands, which have already been formulated > correctly for a widening multiply operation. As long as the existing test cases work, I think the only problem with this idea is if some architecture has a wider range of widening multiply-and-accumulate than it does plain widening multiply. If no such architecture exists then this is fine with me, for whatever that's worth. IIRC, ARM is one of only two architectures (in GCC) with widening multiplies that widen more than twice the width, and, last I looked, the only one that has the patterns to use this code, so it's probably safe. Andrew
On 17/08/12 15:22, Andrew Stubbs wrote: > On 17/08/12 15:04, Richard Earnshaw wrote: >> The fix is to make is_widening_mult_p note that it has been called with >> a WIDEN_MULT_EXPR and rather than decompose the operands again, to >> simply extract the existing operands, which have already been formulated >> correctly for a widening multiply operation. > > As long as the existing test cases work, I think the only problem with > this idea is if some architecture has a wider range of widening > multiply-and-accumulate than it does plain widening multiply. But surely in that case step1 wouldn't have applied and we'd then be looking at a MULT_EXPR not a WIDEN_MULT_EXPR. R.
On 17/08/12 15:31, Richard Earnshaw wrote: > On 17/08/12 15:22, Andrew Stubbs wrote: >> On 17/08/12 15:04, Richard Earnshaw wrote: >>> The fix is to make is_widening_mult_p note that it has been called with >>> a WIDEN_MULT_EXPR and rather than decompose the operands again, to >>> simply extract the existing operands, which have already been formulated >>> correctly for a widening multiply operation. >> >> As long as the existing test cases work, I think the only problem with >> this idea is if some architecture has a wider range of widening >> multiply-and-accumulate than it does plain widening multiply. > > But surely in that case step1 wouldn't have applied and we'd then be > looking at a MULT_EXPR not a WIDEN_MULT_EXPR. Not necessarily. For example, it will use a 32x32->64 signed widening multiply for 16 bit unsigned data if there is no unsigned 16x16->64 option. Hypothetically, if there happened to be an unsigned 16x16->64 multiply-and-accumulate then you'd end up masking it. Like I said though, cross that bridge if it ever comes to it. Andrew
On 17/08/12 15:39, Andrew Stubbs wrote: > On 17/08/12 15:31, Richard Earnshaw wrote: >> On 17/08/12 15:22, Andrew Stubbs wrote: >>> On 17/08/12 15:04, Richard Earnshaw wrote: >>>> The fix is to make is_widening_mult_p note that it has been called with >>>> a WIDEN_MULT_EXPR and rather than decompose the operands again, to >>>> simply extract the existing operands, which have already been formulated >>>> correctly for a widening multiply operation. >>> >>> As long as the existing test cases work, I think the only problem with >>> this idea is if some architecture has a wider range of widening >>> multiply-and-accumulate than it does plain widening multiply. >> >> But surely in that case step1 wouldn't have applied and we'd then be >> looking at a MULT_EXPR not a WIDEN_MULT_EXPR. > > Not necessarily. > > For example, it will use a 32x32->64 signed widening multiply for 16 bit > unsigned data if there is no unsigned 16x16->64 option. Hypothetically, > if there happened to be an unsigned 16x16->64 multiply-and-accumulate > then you'd end up masking it. > > Like I said though, cross that bridge if it ever comes to it. > > Andrew > You've lost me. If we don't have a 16x16->64 mult operation then after step 1 we'll still have a MULT_EXPR, not a WIDEN_MULT_EXPR, so when we reach step2 there's nothing to short circuit. Unless, of course, you're expecting us to get step1 -> 16x16->32 widen mult step2 -> widen64(step1) + acc64 But even this is only safe iff widen64 is the same type of widening as the original widening step, and we determine that by looking at the operands of the widening mult, not at any casts on them. The key thing is that the type of multiply that we want is based on the types of the operands, not on the type of the result. So it's essential we don't strip type conversions beyond the widening conversion. R.
On 17/08/12 15:47, Richard Earnshaw wrote: > If we don't have a 16x16->64 mult operation then after step 1 we'll > still have a MULT_EXPR, not a WIDEN_MULT_EXPR, so when we reach step2 > there's nothing to short circuit. > > Unless, of course, you're expecting us to get > > step1 -> 16x16->32 widen mult > step2 -> widen64(step1) + acc64 No, given a u16xu16->u64 operation in the code, and that the arch doesn't have such an opcode, I'd expect to get step1 -> (u32)u16 x (u32)u16 -> u64 Likewise, 8x8->32 might give (16)8x(16)8->32. The code can't see that the widening operation is non-optimal without looking beyond into its inputs. Andrew
On 17/08/12 16:06, Andrew Stubbs wrote: > On 17/08/12 15:47, Richard Earnshaw wrote: >> If we don't have a 16x16->64 mult operation then after step 1 we'll >> still have a MULT_EXPR, not a WIDEN_MULT_EXPR, so when we reach step2 >> there's nothing to short circuit. >> >> Unless, of course, you're expecting us to get >> >> step1 -> 16x16->32 widen mult >> step2 -> widen64(step1) + acc64 > > No, given a u16xu16->u64 operation in the code, and that the arch > doesn't have such an opcode, I'd expect to get > > step1 -> (u32)u16 x (u32)u16 -> u64 Hmm, I would have thought that would be more costly than (u64)(u16 x u16 -> u32) > > Likewise, 8x8->32 might give (16)8x(16)8->32. > > The code can't see that the widening operation is non-optimal without > looking beyond into its inputs. Ok, in which case we have to give is_widening_mult_rhs_p enough smarts to not strip (s32)u32 and return u32. I'll have another think about it. R.
On 17/08/12 16:20, Richard Earnshaw wrote: >> No, given a u16xu16->u64 operation in the code, and that the arch >> doesn't have such an opcode, I'd expect to get >> >> step1 -> (u32)u16 x (u32)u16 -> u64 > > Hmm, I would have thought that would be more costly than > > (u64)(u16 x u16 -> u32) You might be right, but then extends are often free, especially with unsigned types, so it's hard to say for sure. Did you reproduce one? It's a long time since I last looked at this stuff, so I could be confused. Andrew
--- tree-ssa-math-opts.c (revision 190462) +++ tree-ssa-math-opts.c (local) @@ -2039,6 +2039,25 @@ is_widening_mult_p (gimple stmt, && TREE_CODE (type) != FIXED_POINT_TYPE) return false; + /* If we already have a widening multiply expression, simply extract the + operands. */ + if (gimple_assign_rhs_code (stmt) == WIDEN_MULT_EXPR) + { + *rhs1_out = gimple_assign_rhs1 (stmt); + *rhs2_out = gimple_assign_rhs2 (stmt); + if (TREE_CODE (*rhs1_out) == INTEGER_CST) + *type1_out = TREE_TYPE (*rhs2_out); + else + *type1_out = TREE_TYPE (*rhs1_out); + + if (TREE_CODE (*rhs2_out) == INTEGER_CST) + *type2_out = TREE_TYPE (*rhs1_out); + else + *type2_out = TREE_TYPE (*rhs2_out); + + return true; + } + if (!is_widening_mult_rhs_p (type, gimple_assign_rhs1 (stmt), type1_out, rhs1_out)) return false; Index: 20120817-1.c =================================================================== --- 20120817-1.c (revision 0) +++ 20120817-1.c (revision 0) @@ -0,0 +1,14 @@ +typedef unsigned long long u64; +unsigned long foo = 0; +u64 f() __attribute__((noinline)); + +u64 f() { + return ((u64)40) + ((u64) 24) * (int)(foo - 1); +} + +int main () +{ + if (f () != 16) + abort (); + exit (0); +}