Patchwork [simplify-rtx] Fix 16-bit -> 64-bit multiply and accumulate

login
register
mail settings
Submitter Andrew Stubbs
Date May 25, 2011, 12:35 p.m.
Message ID <4DDCF7A7.4000200@codesourcery.com>
Download mbox | patch
Permalink /patch/97340/
State New
Headers show

Comments

Andrew Stubbs - May 25, 2011, 12:35 p.m.
On 24/05/11 20:35, Joseph S. Myers wrote:
> On Tue, 24 May 2011, Andrew Stubbs wrote:
>
>> I've created this new, simpler patch that converts
>>
>>    (extend (mult a b))
>>
>> into
>>
>>    (mult (extend a) (extend b))
>>
>> regardless of what 'a' and 'b' might be. (These are then simplified and
>> superfluous extends removed, of course.)
>
> Are there some missing conditions here?  The two aren't equivalent in
> general - (extend:SI (mult:HI a b)) multiplies the HImode values in HImode
> (with modulo arithmetic on overflow) before extending the possibly wrapped
> result to SImode.

Ok, I've now modified my patch to prevent it widening regular multiplies.

It now converts

    (extend (mult (extend a) (extend b)))

to

    (mult (newextend a) (newextend b))

But I also have it convert

    (extend (mult (shift a) (extend b)))

to

    (mult (newextend (shift a)) (newextend b))

The latter case is to catch widening multiplies that extract a subreg 
using a shift. I don't understand why it doesn't just use subreg in the 
first place, but apparently it doesn't, and changing it to do that would 
no doubt break many existing machine descriptions.

The latter case also happens to catch cases where an extend is 
represented by (ashiftrt (ashift x)), which is nice.

I know that, potentially, not all shifted operands are going to be 
widening multiplies, but I *think* this should be safe because other 
random shift values are unlikely to match a real widening mult 
instruction (and if they do then the code would already be broken). If 
somebody knows a reason why this isn't safe then I think I'm going to 
need some help figuring out what conditions to use.

OK?

Andrew
Joseph S. Myers - May 25, 2011, 1:19 p.m.
On Wed, 25 May 2011, Andrew Stubbs wrote:

> I know that, potentially, not all shifted operands are going to be widening
> multiplies, but I *think* this should be safe because other random shift
> values are unlikely to match a real widening mult instruction (and if they do
> then the code would already be broken). If somebody knows a reason why this
> isn't safe then I think I'm going to need some help figuring out what
> conditions to use.

Random supposition like that is not a sensible basis for modifying GCC.

I haven't managed to produce an example of code demonstrating the problem, 
but that's probably because I'm not sufficiently familiar with all the RTL 
optimizers.  Where is the guarantee that the inputs to these functions 
must represent real instructions, or that the outputs will only be used if 
they represent real instructions?  Where are the assertions to ensure that 
wrong code is not quietly generated if this is not the case?  Where is the 
documentation of what instruction patterns it is not permitted to put in 
.md files because they would violate the assumptions about what 
instructions you are permitted to represent in RTL?  How have you checked 
there are no existing problematic instruction patterns?

RTL has defined abstract semantics and RTL transformations should be ones 
that are valid in accordance with those semantics, with proper assertions 
if there are additional constraints on the input passed to a function.  
This means actually counting the numbers of variable bits in the operands 
to determine whether the multiplication could overflow.
Andrew Stubbs - May 25, 2011, 1:38 p.m.
On 25/05/11 14:19, Joseph S. Myers wrote:
> RTL has defined abstract semantics and RTL transformations should be ones
> that are valid in accordance with those semantics, with proper assertions
> if there are additional constraints on the input passed to a function.
> This means actually counting the numbers of variable bits in the operands
> to determine whether the multiplication could overflow.

Ok, fair enough, so how can I identify a valid subreg extraction that is 
defined in terms of shifts?

The case that I care about is simple enough:

    (mult:SI (ashiftrt:SI (reg:SI rM)
                          (const_int 16))
             (sign_extend:SI (subreg:HI (reg:SI rN) 0)))

I guess that's just equivalent to this:

    (mult:SI (sign_extend:SI (subreg:HI (reg:SI rM) 4)))
             (sign_extend:SI (subreg:HI (reg:SI rN) 0)))

but it chooses not to represent it that way, which is less than helpful 
in this case.

So I could just scan for that exact pattern, or perhaps look for shift 
sizes that are half the size of the register, or some such thing, but is 
that general enough? Or is it too general again?

Is there anything else I've missed?

Andrew
Joseph S. Myers - May 25, 2011, 1:47 p.m.
On Wed, 25 May 2011, Andrew Stubbs wrote:

> On 25/05/11 14:19, Joseph S. Myers wrote:
> > RTL has defined abstract semantics and RTL transformations should be ones
> > that are valid in accordance with those semantics, with proper assertions
> > if there are additional constraints on the input passed to a function.
> > This means actually counting the numbers of variable bits in the operands
> > to determine whether the multiplication could overflow.
> 
> Ok, fair enough, so how can I identify a valid subreg extraction that is
> defined in terms of shifts?

The shift must be by a positive constant amount, strictly less than the 
precision (GET_MODE_PRECISION) of the mode (of the value being shifted).  
If that applies, the relevant number of bits is the precision of the mode 
minus the number of bits of the shift.  For an extension, just take the 
number of bits in the inner mode.  Add the two numbers of bits; if the 
result does not exceed the number of bits in the mode (of the operands and 
the multiplication) then the multiplication won't overflow.

As in your patch, either all the operands must be sign-extensions / 
arithmetic shifts (and then the result is equivalent to a widening signed 
multiply), or all must be zero-extensions / logical shifts (and the result 
is a widening unsigned multiply).

Patch

2011-05-25  Bernd Schmidt  <bernds@codesourcery.com>
	    Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* simplify-rtx.c (simplify_unary_operation_1): Canonicalize widening
	multiplies.
	* doc/md.texi (Canonicalization of Instructions): Document widening
	multiply canonicalization.

	gcc/testsuite/
	* gcc.target/arm/mla-2.c: New test.

--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5840,6 +5840,21 @@  Equality comparisons of a group of bits (usually a single bit) with zero
 will be written using @code{zero_extract} rather than the equivalent
 @code{and} or @code{sign_extract} operations.
 
+@cindex @code{mult}, canonicalization of
+@item
+@code{(sign_extend:@var{m1} (mult:@var{m2} (sign_extend:@var{m2} @var{x})
+(sign_extend:@var{m2} @var{y})))} is converted to @code{(mult:@var{m1}
+(sign_extend:@var{m1} @var{x}) (sign_extend:@var{m1} @var{y}))}, and likewise
+for @code{zero_extend}.
+
+@item
+@code{(sign_extend:@var{m1} (mult:@var{m2} (ashiftrt:@var{m2}
+@var{x} @var{s}) (sign_extend:@var{m2} @var{y})))} is converted to
+@code{(mult:@var{m1} (sign_extend:@var{m1} (ashiftrt:@var{m2} @var{x} @var{s}))
+(sign_extend:@var{m1} @var{y}))}, and likewise for patterns using @code{zero_extend}
+and @code{lshiftrt}.  If the second operand of @code{mult} is also a shift,
+then that is extended also.
+
 @end itemize
 
 Further canonicalization rules are defined in the function
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -1000,6 +1000,34 @@  simplify_unary_operation_1 (enum rtx_code code, enum machine_mode mode, rtx op)
 	  && GET_CODE (XEXP (XEXP (op, 0), 1)) == LABEL_REF)
 	return XEXP (op, 0);
 
+      /* Extending a widening multiplication should be canonicalized to
+	 a wider widening multiplication.  */
+      if (GET_CODE (op) == MULT)
+	{
+	  rtx lhs = XEXP (op, 0);
+	  rtx rhs = XEXP (op, 1);
+	  enum rtx_code lcode = GET_CODE (lhs);
+	  enum rtx_code rcode = GET_CODE (rhs);
+
+	  /* Widening multiplies usually extend both operands, but sometimes
+	     they use a shift to extract a portion of a register. We assume
+	     it is safe to widen all such operands because other examples
+	     won't match real instructions.  */
+	  if ((lcode == SIGN_EXTEND || lcode == ASHIFTRT)
+	      && (rcode == SIGN_EXTEND || rcode == ASHIFTRT))
+	    {
+	      enum machine_mode lmode = GET_MODE (lhs);
+	      enum machine_mode rmode = GET_MODE (lhs);
+	      return simplify_gen_binary (MULT, mode,
+					  simplify_gen_unary (SIGN_EXTEND,
+							      mode,
+							      lhs, lmode),
+					  simplify_gen_unary (SIGN_EXTEND,
+							      mode,
+							      rhs, rmode));
+	    }
+	}
+
       /* Check for a sign extension of a subreg of a promoted
 	 variable, where the promotion is sign-extended, and the
 	 target mode is the same as the variable's promotion.  */
@@ -1071,6 +1099,34 @@  simplify_unary_operation_1 (enum rtx_code code, enum machine_mode mode, rtx op)
 	  && GET_MODE_SIZE (mode) <= GET_MODE_SIZE (GET_MODE (XEXP (op, 0))))
 	return rtl_hooks.gen_lowpart_no_emit (mode, op);
 
+      /* Extending a widening multiplication should be canonicalized to
+	 a wider widening multiplication.  */
+      if (GET_CODE (op) == MULT)
+	{
+	  rtx lhs = XEXP (op, 0);
+	  rtx rhs = XEXP (op, 1);
+	  enum rtx_code lcode = GET_CODE (lhs);
+	  enum rtx_code rcode = GET_CODE (rhs);
+
+	  /* Widening multiplies usually extend both operands, but sometimes
+	     they use a shift to extract a portion of a register. We assume
+	     it is safe to widen all such operands because other examples
+	     won't match real instructions.  */
+	  if ((lcode == ZERO_EXTEND || lcode == LSHIFTRT)
+	      && (rcode == ZERO_EXTEND || rcode == LSHIFTRT))
+	    {
+	      enum machine_mode lmode = GET_MODE (lhs);
+	      enum machine_mode rmode = GET_MODE (lhs);
+	      return simplify_gen_binary (MULT, mode,
+					  simplify_gen_unary (ZERO_EXTEND,
+							      mode,
+							      lhs, lmode),
+					  simplify_gen_unary (ZERO_EXTEND,
+							      mode,
+							      rhs, rmode));
+	    }
+	}
+
       /* (zero_extend:M (zero_extend:N <X>)) is (zero_extend:M <X>).  */
       if (GET_CODE (op) == ZERO_EXTEND)
 	return simplify_gen_unary (ZERO_EXTEND, mode, XEXP (op, 0),
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mla-2.c
@@ -0,0 +1,9 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+long long foolong (long long x, short *a, short *b)
+{
+    return x + *a * *b;
+}
+
+/* { dg-final { scan-assembler "smlalbb" } } */