Patchwork [simplify-rtx] Fix 16-bit -> 64-bit multiply and accumulate

login
register
mail settings
Submitter Andrew Stubbs
Date May 24, 2011, 4:43 p.m.
Message ID <4DDBE035.8050901@codesourcery.com>
Download mbox | patch
Permalink /patch/97180/
State New
Headers show

Comments

Andrew Stubbs - May 24, 2011, 4:43 p.m.
On 03/05/11 10:07, Bernd Schmidt wrote:
> I tried to fix it with the patch below, which unfortunately doesn't work
> since during combine we don't see the SIGN_EXTEND operations inside the
> MULT, but two shift operations instead. Maybe you can complete it from here?

I've tried to make this patch go various ways, and I always find a test 
case that doesn't quite work. I think we're approaching it from the 
wrong angle.

The problem is that widening multiplies are required to be defined as 
(mult (extend ..) (extend ..)), but when the combiner tries to build a 
widening multiply pattern from a regular multiply it naturally ends up 
as (extend (mult .. ..)). The result is that the patch Benrd posted made 
existing widening multiplies wider, but failed to convert regular 
multiplies to widening ones.

I've created this new, simpler patch that converts

   (extend (mult a b))

into

   (mult (extend a) (extend b))

regardless of what 'a' and 'b' might be. (These are then simplified and 
superfluous extends removed, of course.)

I find that this patch fixes all the testcases I have, and permitted me 
to add support for ARM smlalbt/smlaltb/smlaltt also (I'll post that in a 
separate patch).

It does assume that the outer sign_extend/zero_extend indicates the 
inner extend types though, so I'm not sure if there's a problem there?

OK?

Andrew
Joseph S. Myers - May 24, 2011, 7:35 p.m.
On Tue, 24 May 2011, Andrew Stubbs wrote:

> I've created this new, simpler patch that converts
> 
>   (extend (mult a b))
> 
> into
> 
>   (mult (extend a) (extend b))
> 
> regardless of what 'a' and 'b' might be. (These are then simplified and
> superfluous extends removed, of course.)

Are there some missing conditions here?  The two aren't equivalent in 
general - (extend:SI (mult:HI a b)) multiplies the HImode values in HImode 
(with modulo arithmetic on overflow) before extending the possibly wrapped 
result to SImode.  You'd need a and b themselves to be extended from 
narrower modes in such a way that if you interpret the extended values in 
the signedness of the outer extension, the result of the multiplication is 
exactly representable in the mode of the multiplication.  (For example, if 
both values are extended from QImode, and all extensions have the same 
signedness, that would be OK.  There are cases that are OK where not all 
extensions have the same signedness, e.g. (sign_extend:DI (mult:SI a b)) 
where a and b are zero-extended from HImode or QImode, at least one from 
QImode, though there the outer extension is equivalent to a 
zero-extension.)
Andrew Stubbs - May 25, 2011, 8:30 a.m.
On 24/05/11 20:35, Joseph S. Myers wrote:
> On Tue, 24 May 2011, Andrew Stubbs wrote:
>
>> I've created this new, simpler patch that converts
>>
>>    (extend (mult a b))
>>
>> into
>>
>>    (mult (extend a) (extend b))
>>
>> regardless of what 'a' and 'b' might be. (These are then simplified and
>> superfluous extends removed, of course.)
>
> Are there some missing conditions here?  The two aren't equivalent in
> general - (extend:SI (mult:HI a b)) multiplies the HImode values in HImode
> (with modulo arithmetic on overflow) before extending the possibly wrapped
> result to SImode.  You'd need a and b themselves to be extended from
> narrower modes in such a way that if you interpret the extended values in
> the signedness of the outer extension, the result of the multiplication is
> exactly representable in the mode of the multiplication.  (For example, if
> both values are extended from QImode, and all extensions have the same
> signedness, that would be OK.  There are cases that are OK where not all
> extensions have the same signedness, e.g. (sign_extend:DI (mult:SI a b))
> where a and b are zero-extended from HImode or QImode, at least one from
> QImode, though there the outer extension is equivalent to a
> zero-extension.)

So, you're saying that promoting a regular multiply to a widening 
multiply isn't a valid transformation anyway? I suppose that does make 
sense. I knew something was too easy.

OK, I'll go try again. :)

Andrew
Joseph S. Myers - May 25, 2011, 10:16 a.m.
On Wed, 25 May 2011, Andrew Stubbs wrote:

> So, you're saying that promoting a regular multiply to a widening multiply
> isn't a valid transformation anyway? I suppose that does make sense. I knew

In general, yes.  RTL always has modulo semantics (except for division and 
remainder by -1); all optimizations based on undefinedness of overflow (in 
the absence of -fwrapv) happen at tree/GIMPLE level, where signed and 
unsigned types are still distinct.  (So you could promote a regular 
multiply of signed types at GIMPLE level in the absence of 
-fwrapv/-ftrapv, but not at RTL level and not for unsigned types at GIMPLE 
level.)

Patch

2011-05-24  Bernd Schmidt  <bernds@codesourcery.com>
	    Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* simplify-rtx.c (simplify_unary_operation_1): Create a new
	canonical form for widening multiplies.
	* doc/md.texi (Canonicalization of Instructions): Document widening
	multiply canonicalization.

	gcc/testsuite/
	* gcc.target/arm/mla-2.c: New test.

--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5840,6 +5840,11 @@  Equality comparisons of a group of bits (usually a single bit) with zero
 will be written using @code{zero_extract} rather than the equivalent
 @code{and} or @code{sign_extract} operations.
 
+@cindex @code{mult}, canonicalization of
+@item
+@code{(sign_extend:@var{m1} (mult:@var{m2} @var{x} @var{y}))} is converted
+to @code{(mult:@var{m1} (sign_extend:@var{m1} @var{x}) (sign_extend:@var{m1} @var{y}))}, and likewise for @code{zero_extract}.
+
 @end itemize
 
 Further canonicalization rules are defined in the function
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -1000,6 +1000,21 @@  simplify_unary_operation_1 (enum rtx_code code, enum machine_mode mode, rtx op)
 	  && GET_CODE (XEXP (XEXP (op, 0), 1)) == LABEL_REF)
 	return XEXP (op, 0);
 
+      /* Convert (sign_extend (mult ..)) to a canonical widening
+	 mulitplication (mult (sign_extend ..) (sign_extend ..)).  */
+      if (GET_CODE (op) == MULT && GET_MODE (op) < mode)
+	{
+	  rtx lhs = XEXP (op, 0);
+	  rtx rhs = XEXP (op, 1);
+	  enum machine_mode lhs_mode = GET_MODE (lhs);
+	  enum machine_mode rhs_mode = GET_MODE (rhs);
+	  return simplify_gen_binary (MULT, mode,
+				      simplify_gen_unary (SIGN_EXTEND, mode,
+							  lhs, lhs_mode),
+				      simplify_gen_unary (SIGN_EXTEND, mode,
+							  rhs, rhs_mode));
+	}
+
       /* Check for a sign extension of a subreg of a promoted
 	 variable, where the promotion is sign-extended, and the
 	 target mode is the same as the variable's promotion.  */
@@ -1071,6 +1086,21 @@  simplify_unary_operation_1 (enum rtx_code code, enum machine_mode mode, rtx op)
 	  && GET_MODE_SIZE (mode) <= GET_MODE_SIZE (GET_MODE (XEXP (op, 0))))
 	return rtl_hooks.gen_lowpart_no_emit (mode, op);
 
+      /* Convert (zero_extend (mult ..)) to a canonical widening
+	 mulitplication (mult (zero_extend ..) (zero_extend ..)).  */
+      if (GET_CODE (op) == MULT && GET_MODE (op) < mode)
+	{
+	  rtx lhs = XEXP (op, 0);
+	  rtx rhs = XEXP (op, 1);
+	  enum machine_mode lhs_mode = GET_MODE (lhs);
+	  enum machine_mode rhs_mode = GET_MODE (rhs);
+	  return simplify_gen_binary (MULT, mode,
+				      simplify_gen_unary (ZERO_EXTEND, mode,
+							  lhs, lhs_mode),
+				      simplify_gen_unary (ZERO_EXTEND, mode,
+							  rhs, rhs_mode));
+	}
+
       /* (zero_extend:M (zero_extend:N <X>)) is (zero_extend:M <X>).  */
       if (GET_CODE (op) == ZERO_EXTEND)
 	return simplify_gen_unary (ZERO_EXTEND, mode, XEXP (op, 0),
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mla-2.c
@@ -0,0 +1,9 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+long long foolong (long long x, short *a, short *b)
+{
+    return x + *a * *b;
+}
+
+/* { dg-final { scan-assembler "smlalbb" } } */